Big Data

ScORe – Schema On Read for Spark SQL

Lior Chaga | 27 May 2021 | Big Data

Tags: open source, parquet, pruning, schema, Spark, Spark-SQL

The world is not flat, it’s highly nested With over 4 billion page views per day and over Read More...

The Challenges Of Uploading 150TB/day From Spark To BigQuery – Part 2

Itai Barel and Gaash Hazan | 25 Mar 2021 | Big Data

Tags: Airflow, BigQuery, Cloud Storage, Google Cloud, Lessons, performance, Scale, Spark

In part 1 of the series we shared the architecture of Taboola’s PV2Google service which uploads over 150TB/day Read More...

The Challenges Of Uploading 150TB/day From Spark To BigQuery – Part 1

Itai Barel and Gaash Hazan | 25 Mar 2021 | Big Data

Tags: Airflow, BigQuery, Cloud Storage, Google Cloud, performance, Scale, Spark

Have you ever tried building an infrastructure to upload 150TB a day? Have you ever tried querying over Read More...

Anomaly detection using LSTM with Autoencoder

Gali Katz | 14 Sep 2020 | Big Data

Tags: autoencoder, LSTM, Metrics

Taboola is one of the largest content recommendation companies in the world. We maintain hundreds of servers in Read More...

Using Spark Dynamic Allocation

Igor Berman | 24 Jun 2020 | Big Data

Tags: big data, dynamic allocation, infra, mesos, performance, Spark

The story starts with metrics. Every mature software company needs to have a metric system to monitor resource Read More...

Collaborative Trial: On Optimizing Recommendation Testing

Maoz Cohen | 09 Jun 2020 | Big Data

Tags: a/b testing, algorithms, big data, data, data science, Monitoring, performance, statistics, testing

Taboola is responsible for billions of daily recommendations, and we are doing everything we can to make those Read More...

Stop waking up at night over MySQL replication

Ariel Pisetzky | 26 May 2020 | Big Data


MySQL Slave Replication Optimization Written by Yossi Kalif & Ariel Pisetzky   MySQL in Taboola So you love Read More...

Fear of breaking production? Use Grafana!

Tal Bar Zvi | 07 May 2020 | Big Data

Tags: Grafana, Monitoring, Observability

In Taboola, we deal with scale, huge scale. A small issue might turn into a disaster in a Read More...

Monitoring and Metering at Scale

Gali Katz | 07 Dec 2019 | Big Data

Tags: Metrictank, Monitoring, Prometheus, Thanos

In this blogpost I will describe how we, at Taboola, changed our metrics infrastructure twice as a result Read More...

Going Old-School: Designing Algorithms for Fast Weighted Sampling in Production

Shaked Zychlinski | 06 Jun 2019 | Big Data

Tags: algorithms, performance, production, real-time, sampling, uncertainty

If you happen to write code for a living, there’s a pretty good chance you’ve found yourself explaining Read More...

Bucket the shuffle out of here!

Igor Berman and Radik Komarnitsky | 28 Mar 2019 | Big Data

Tags: big data, data, performance, shuffles, Spark, Spark-on-demand, tips

Intro At Taboola we use Spark extensively throughout the pipeline. Regularly faced with Spark-related scalability challenges, we look Read More...

Get real life debugging using Kibana and Elastic

Eyal Zur | 30 Dec 2018 | Big Data

Tags: elasticsearch, kibana

We all have these amazing machines in our development and testing labs, and we know that our real Read More...

How I Resolved Delays in Kafka Messages by Prioritizing Kafka Topics

Gal Shelach | 17 Oct 2018 | Big Data

Tags: big data, java, kafka, tips

As a team member in the Scale Performance Data group of Taboola’s R&D, I had the opportunity to Read More...

Deep Learning – from Prototype to Production

Erez Kashi | 30 Jul 2017 | Big Data

Tags: deep learning, docker, Nvidia, python, tensorflow

About 8 months ago my team and I were facing the challenge of building our first Deep Learning Read More...

More than one Graph – Code Reuse in TensorFlow

Jenia Gorokhovsky | 02 Jul 2017 | Big Data

Tags: deep learning, tensorflow, web development

Large production pipelines in TensorFlow are quite difficult to pull off. Training small models is easy, and we Read More...