LLAP (low latency analytical processing) has been a significant architectural advancement for improving Hive query latencies, showing significant improvements in self-managed Hadoop cluster deployments. Did you know that the benefits of LLAP also work well in cloud-provider hosted IaaS offerings?
This talk presents a performance study of several common query patterns executing in Amazon Elastic MapReduce. We repeated multiple trials of the same query patterns executing in Hive (without LLAP), Spark, and Hive (with LLAP). For our usage patterns, Hive with LLAP provided the best overall query performance. This also translated to needing fewer EC2 instances in each EMR cluster to hit our performance goals giving us a bottom-line improvement in AWS billing. Attendees will learn how to configure LLAP in EMR clusters and see the full results of the performance analysis. We will share observations about scalability trends in relation to EMR cluster sizing and potential benchmarking pitfalls, which could be applicable to your own performance analysis efforts.