Druid and Hive Together : Use Cases and Best Practices

Druid and Hive Together : Use Cases and Best Practices

Thursday, March 21
4:00 PM - 4:40 PM
Room 122-123

Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in realtime. On the other hand, Druid excels at low-latency, interactive queries over streaming data and making data available in realtime for queries. Although the high level messaging presented by both projects may lead you to believe they are competing for same use case, the technologies are in fact extremely complementary solutions.

By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build more powerful, flexible, and extremely low latency realtime streaming analytics solutions. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits, use cases, best practices and benchmark numbers.

The Agenda of the talk will be -
1. Motivation behind integrating Druid with Hive
2. Druid and Hive together - benefits
3. Use Cases with Demos and architecture discussion
4. Best Practices - Do's and Don'ts
5. Performance vs Cost Tradeoffs
6. SSB Benchmark Numbers

Presentation Video


Nishant Bangarwa
Software engineer
Hortonworks, Inc.
Nishant is Druid PMC member and Software Engineer at Hortonworks. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.