By using this site, you agree to our use of cookies.

For more information read our Privacy Policy.

Got it

Serving up some of the web’s best content with Cloud Dataproc


Data Analytics

Digital Media

United States

Founded in 2006, Outbrain is one of the world’s leading content discovery platforms, serving more than 275 billion recommendations to readers every month.. In a world in which we are increasingly bombarded with information, it is harder and more important than ever to get the right content to the right people.

Outbrain, one of the world’s leading content discovery platforms, prides itself on helping publishers and readers separate the signal from the noise and get to the articles they really want to read. Building on more than a decade of experience, Outbrain has honed its recommendation service, working with top publishers like CNN, Le Parisien, and The Guardian. Today, with native placements on many of the world’s premium publishers, it makes 275 billion highly targeted recommendations to over 600 million unique users every month.

Orit Yaron, VP of Cloud Platform

"With Cloud Dataproc, we implemented autoscaling which enabled us to increase or reduce the clusters easily depending on the size of the project. On top of that, we also used preemptible nodes for parts of the clusters, which helped us with efficiency of costs.”

The brief

“We were at a point where we needed to update our research cluster as the technology it was built on was becoming obsolete. At the same time, we saw an opportunity for Google Cloud Platform to help us use our clusters more efficiently than with our previous infrastructure.”

What we did

“With more than a decade’s experience, Outbrain has a wealth of accumulated knowledge to work with, amounting to around 6 petabytes of data in its research cluster. The company’s existing infrastructure consisted of thousands of servers hosted across multiple data centers. While this solution works for Outbrain’s live recommendation service, which is constantly in use, its research projects have different requirements. Research cluster usage is much more elastic, so smaller projects can lead to idle servers and wasted costs when less compute power is needed.

By 2017, it was time for the company to upgrade its research clusters to use the latest open-source technologies, such as Apache Spark or ORC. Outbrain saw this as the perfect opportunity to redesign its research infrastructure with scalability and efficiency at its core.

The task came with some challenges, however. First, time was tight. Outbrain’s hosting agreement was up for renewal at the end of Q1 2018, effectively giving the company just four months to choose, trial, and fully migrate its new solution. Second, Outbrain could not afford any disruption to its normal activities during the migration. Finally, the new research cluster, even though it was in the public cloud, would have to work seamlessly with the company’s live recommendation cluster, which remained on-premises. After a careful evaluation of costs and technology alignments, Outbrain chose to test a new research cluster built with GCP.”

The result

– Analyzes more than 6 petabytes of data on Hadoop clusters with Cloud Dataproc, working as part of a hybrid infrastructure.

– Expands the scope of new research projects with fewer hardware challenges and greater cost transparency.

– Drives efficiency with auto scaling, preemptible nodes and the ability to turn clusters on and off at will.

Get in touch

Looking for a cloud partner? We’ve got you covered.

Our team of experts can take care of your cloud, for much less than the cost of hiring that talent in-house.