As containerization continues to gain momentum and become a de facto standard for application deployment, challenges around containerization of big data workloads are coming to light. Great strides have been made within the open source communities towards running big data workloads in containers, but much is left to be done.
Apache Hadoop YARN is the modern distributed operating system for big data applications. It has morphed the Hadoop compute layer into a common resource-management platform that can host a wide variety of applications. At its core, YARN has a very powerful scheduler which enforces global cluster level invariants and helps sites manage user and operator expectations of elastic sharing, resource usage limits, SLAs, and more. YARN recently increased its support for Docker containerization and added a YARN service framework supporting long-running services.
In this session we will explore the emerging patterns and challenges related to containers and big data workloads, including running applications such as Apache Spark, Apache HBase, and Kubernetes in containers on YARN.