How Azure Data Explorer Helped Us Make Sense of 1M Log Lines per Second

Ariel Pisetzky

Ariel Pisetzky

Ariel joined Taboola to connect people, business needs and IT. With over 20 years in IT Ariel leads Taboola's team of IT professionals implementing state-of-the-art solutions, from Open Source to home grown to traditional enterprise software, across the company’s global infrastructure. Ariel is passionate about making IT work for the business and getting things done.

Ariel Pisetzky | 07 Nov 2018 | System

Tags: web development

As VP of IT at Taboola, my teams and I are overwhelmed with logs, pinned down by the rate and volume of them. The job of the Production Site Reliability Engineering (SRE) team in Taboola is to keep the technology running smoothly and bring in as many insights as we can from the system, making sure that any and every technical issue (that isn’t self healing or contained) is dealt with quickly. We also support this torrent of incoming data to make sure that any insights that can be gleaned from this data are found. With over 1B users discovering what’s interesting and new through the Taboola Feed, we can’t drop the ball or stop thinking about our logs, log management, where to store them and how to process them.

This is the challenge of processing over one million lines of logs every second.

To address this challenge, we engaged with the Azure Data Explorer and after the first very simple demo, it was clear to us that this technology had something unique to it, something new to the mix of storing, indexing and exploring data. We decided to test the Azure Data Explorer to see if there is a quick win. It was clear that we need to get a full day of logs into the system, not just a sample of data to see how fast it can provide us with insights and moreover, to understand if the Azure Data Explorer can ingest the data. Once we shipped one day of logs into Azure, it was clear that the insights that we can run are far and fast reaching. Machine learning? No problem. More days of data? Easy. Scale? Done. Now we can can keep our eyes on the ball. In our case, the ball is business SLA and improving the time to resolve problems.

Working with our partners, we were able to get many of our logs written directly to Azure Storage so that the ingress of data was solved without any heavy lifting. The ability to ship logs directly from the CDN (and other SaaS providers) was easily done through a configuration setting. Sending logs from our on-prem servers proved to be straightforward coding. On top of that, as ingress networking is free, cost was not an issue. The data import into Azure Data Explorer was the next phase, this is a short configuration that brings all the data in. Retention policy, and access to the data are also simple configurations. With minimal coding, and mostly configurations, it was a fast path from raw logs to data insights and clear results.

Working with Azure Data Explorer we were able to get fast answers to in depth queries on the data. The dashboards made sense of a lot of data with a faster update time and an intuitive UI. Furthermore, we were able to democratize the access to the data — anyone with a need to access the data can find what they need and make it work for them. The path to scale comes not only in the ability to store, index and report vast amounts of data, but also on the ability to provide Taboola employees fast and easy access to it. The more people working on the data, the more that is accomplished.

On the cost management side, the ability to provide unrestricted queries to my users without looking at the cost of each data scan is liberating. We can now have as many employees as we need access the data and not worry about the bill for each query they run. The cost of running the system on Azure has proven to have a lower TCO than our on-prem log management system. Helping us to focus on what’s important, also brought added benefits of time better spent on the business impact of the SRE teams.