Spark

The Challenges Of Uploading 150TB/day From Spark To BigQuery – Part 2

Itai Barel and Gaash Hazan | 25 Mar 2021 | Big Data

Tags: Airflow, BigQuery, Cloud Storage, Google Cloud, Lessons, performance, Scale, Spark

In part 1 of the series we shared the architecture of Taboola’s PV2Google service which uploads over 150TB/day Read More...

The Challenges Of Uploading 150TB/day From Spark To BigQuery – Part 1

Itai Barel and Gaash Hazan | 25 Mar 2021 | Big Data

Tags: Airflow, BigQuery, Cloud Storage, Google Cloud, performance, Scale, Spark

Have you ever tried building an infrastructure to upload 150TB a day? Have you ever tried querying over Read More...

Using Spark Dynamic Allocation

Igor Berman | 24 Jun 2020 | Big Data

Tags: big data, dynamic allocation, infra, mesos, performance, Spark

The story starts with metrics. Every mature software company needs to have a metric system to monitor resource Read More...

Bucket the shuffle out of here!

Igor Berman and Radik Komarnitsky | 28 Mar 2019 | Big Data

Tags: big data, data, performance, shuffles, Spark, Spark-on-demand, tips

Intro At Taboola we use Spark extensively throughout the pipeline. Regularly faced with Spark-related scalability challenges, we look Read More...