Two popular open source technologies, Druid and Apache Hive, are often mentioned as viable solutions for large-scale analytics. Hive works well for storing large volumes of data, although not optimized for ingesting streaming data and making it available for queries in realtime. On the other hand, Druid excels at low-latency, interactive queries over streaming data and making data available in realtime for queries. Although the high level messaging presented by both projects may lead you to believe they are competing for same use case, the technologies are in fact extremely complementary solutions.
By combining the rich query capabilities of Hive with the powerful realtime streaming and indexing capabilities of Druid, we can build more powerful, flexible, and extremely low latency realtime streaming analytics solutions. In this talk we will discuss the motivation to combine Hive and Druid together alongwith the benefits, use cases, best practices and benchmark numbers.
The Agenda of the talk will be -
1. Motivation behind integrating Druid with Hive
2. Druid and Hive together - benefits
3. Use Cases with Demos and architecture discussion
4. Best Practices - Do's and Don'ts
5. Performance vs Cost Tradeoffs
6. SSB Benchmark Numbers