Fibers from out of (user) space – Overview

Jordan Sheinfeld

Jordan Sheinfeld

Principal Engineer at the Taboola's Video group. Very enthusiastic about new technologies. Today spends most of his time learning and deploying new technologies and improving the performance and scale of systems.

Jordan Sheinfeld | 25 Dec 2018 | Java

Tags: concurrency, CPU, fibers, performance, threads

A couple of months ago my team had its first experience working with Java fibers, we needed to make our main application work asynchronously.
In this 3 part series, I will share my team’s experience and how we deploy and implement Java fibers in production.

We will cover what fibers are, how to use them, their pros and cons, and their internals, all in a mix between guide and blog describing our experience.

Fibers are a sort of lightweight threads, that are meant to address performance, scale and code structure in our applications, they can work together or replace threads. If you are dealing with concurrency, code structure and asynchronous challenges, or you are just interested in learning this technology, this blog post series is for you.

The first part of this series is an overview of what fibers are, the next parts are diving deeper into the technology and experience – so stay tuned and don’t miss them!

 

We started by trying to figure out what fibers are and how they can help us

It all started when we needed to make our main application to work asynchronously.

Taboola is serving around 300K video requests per second – these are only requests to display video ads, content recommendations are even more – as part of the video group, one of our main services is a bidding service.

A bidding service is when you, as a user, come in to read an article on a site that partners with Taboola, our service then goes and offers publishers the opportunity to bid for the ad that you will end up seeing. Meaning, it needs to send requests to dozens of demand partners to bid and choose the best offer, and do all that in milliseconds.
Now multiply this by 300K requests per seconds and you get a sense of the workload this service needs to do, this is a standard service in the AdTech domain.

Bidding can be complicated, it requires you to process a request and return a result in milliseconds, while executing complex business logic, which can involve calling other external partners, calling external network services and perhaps querying a key-value store.

Our first implementation was a thread-per-request model that of course reached the limits of QPS (Queries per second, A.k.a TPS – Transactions per second) very fast and was not scalable by any means.

In order to make the application scalable, we broke it apart. If you have not had the pleasure of converting your application to an async one, you’re in luck – it’s hell. If you have, you know what we’ve gone through.

We got the performance improvement we needed, but adding new async handling made our code unreadable and hard to maintain. Adding new async functionalities was hard and fertile ground for bug growth.

Thus, we decided to explore Fibers – we needed a solution to maintain our scalability issues and keep our code readable and maintainable.

 

The world of fibers (AKA: lightweight threads)

Fibers in essence are lightweight versions of threads, which are not managed by the OS. They run in the user-space (application level) vs. threads that run in the OS kernel space. They are lightweight in terms of RAM (idle fiber occupies ~400 bytes of RAM vs. 1MB in a Java Thread by default), and put a far lesser burden on the CPU when context-switching. It’s possible to have millions of fibers in an application. Sometimes they are referred to as lightweight threads, or continuations.

Fibers are especially useful for replacing callback asynchronous code. They allow you to enjoy the scalability and performance benefits of asynchronous code, while keeping the code imperative and clean, plus eliminating callback hell.

Fibers are not meant to replace threads in all circumstances. A fiber should be used when the code it executes blocks very often waiting on other fibers, or when we wish to breakdown the parallelism to a huge number of tasks that would otherwise be a big burden on the CPU in terms of context switching when used as normal threads.

For long-running computations that rarely block, traditional threads are preferable. Luckily, fibers and threads interoperate very well, due to an abstraction called Strand.

Fibers are often introduced together with other async processing frameworks such as Akka, Vert.x, NodeJS and Kilim, however this time we’ll focus on Java Quasar fibers.

 

Thread limit on the operating system

To emphasize the difference between fibers and threads, let’s see how threads are limited in the OS, and later on (Part 2) we’ll run some tests to prove that.

Theoretically speaking, Unix systems do not specify a per process thread limit count and there is some sort of global limit on the number of threads allowed (cat /proc/sys/kernel/threads-max).
However, processes are indirectly limited to allocate threads, mostly because of their stack size, by default 1024k on 64bit JVMs per thread. So simple math will give us, for example, for 10000 threads = 10000 * 1024k = 10gb of RAM only for thread storage. Usually, a so called “normal” process should have a maximum of a few hundred threads.

 

Context switching

While context switching is a known behavior of operating systems and CPUs (https://en.wikipedia.org/wiki/Context_switch), fibers behave differently.

They offer an alternative to the expensive kernel space context switching in a high number of threads concurrency.
For example, say we want to create a logical entity in our game, a monster. Our monster has characteristics, life, logic, state and actions it can take. Let’s say we want our game to handle 100,000 monsters, and make them run in parallel, and do all kind of actions. If we go for a thread-per-monster implementation we will need to have 100,000 threads, to run them in parallel or use a thread pool that will queue 100k monsters to manage their life activities. In a fiber implementation we can have a fiber-per-monster architecture, which will be run transparently by a backed up supported thread pool (see ForkJoinPool in Part 3).

 

What’s next

In this part we got a high level overview of fibers. In the next part we’ll read further in-depth about fibers and how they differ from threads, we’ll look at how to create fibers and how to work with them – diving into the bits and bolts.

 

Continue reading the next part ….

Part II – Hands on