Setting the Stage for Fast Analytics with Druid

Setting the Stage for Fast Analytics with Druid

Wednesday, May 22
11:50 AM - 12:30 PM
Marquis Salon 8

Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”-style) on large data sets. This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit. Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics. Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack.

The most important contributor to a fast analytical setup is getting the data model right. The talk will center around various choices you can make to prepare your data to get best possible query performance.

We’ll look at some general best practices to model your data before ingestion such as OLAP dimensional modeling (called “roll-up” in Druid), data partitioning, and tips for choosing column types and indexes. We’ll also look at how more can be less: often, storing copies of your data partitioned, sorted, or aggregated in different ways can speed up queries by reducing the amount of computation needed.

We’ll also look at Druid-specific optimizations that take advantage of approximations; where you can trade accuracy for performance and reduced storage. You’ll get introduced to Druid’s features for approximate counting, set operations, ranking, quantiles, and more.

講演者

Surekha Saharan
Software Engineer
Imply
Surekha Saharan is a Druid Committer and Software Engineer at Imply. Previously, she has worked at cloud startup and Cisco Systems where she prototyped, architected and implemented large scale systems. She holds a MS in Computer Science from University of Southern California and BS in Computer Engineering from National Institute of Technology, India.
Benjamin Hopp
Solutions Architect
Imply
Benjamin Hopp has been involved in architecting big data and streaming data solutions for companies of all sizes. Currently, he is a Solutions Architect with Imply where he assists organizations to deploy and manage Apache Druid solutions. Previously, he worked as a Senior Systems Architect with Hortonworks specializing in streaming data use-cases using HDF and Apache NiFi.