Criteo has a 100 PB production cluster of four namenodes and 2000 datanodes that runs 300,000 jobs a day. At the moment, we backup some of the cluster's data on our old 1200 node ex-production cluster.
We build our clusters in our own datacenters, as running in the cloud would be many times more expensive. In 2018, we will build yet another datacenter that will backup, in storage and compute, and then replace the main cluster. This presentation will describe what we learned when building multiple Hadoop clusters and why in-house bare-metal is better than the cloud.
Building a cluster requires testing the hardware from several manufacturers and choosing the most cost-effective option. We have now done these tests twice and can provide advice on how to do it right the first time.
Our two clusters were meant to provide a redundant solution to Criteo's storage and compute needs. We will explain our project, what went wrong, and our progress in building yet another, even bigger, cluster to create a computing system that will survive the loss of an entire datacenter. Disasters are not always external, they can come from operator error that could wipe out both Hadoop clusters. We are thus adding a second different backup technology for our most important 10 PB of data. Which technology will we use and how did we test it?
An HDFS namenode cannot store an infinite number of data blocks. We will share what we have had to do to reach almost half a billion blocks, including sharding our data by adding namenodes. Hadoop, especially at that this scale, does not run itself, so what operational skills and tools are required to keep the clusters healthy, the data safe and the jobs running 24 hours a day every day?