Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls