Containerizing big data systems offers multiple benefits to your business, including reduced costs, agility to scale, and simplified operations. However, for production scale Hadoop inside containers, you need to figure out storage, networking, anti/affinity, data-locality, etc. Robin takes an innovative new approach called Application-Defined Infrastructure, wherein developers and IT teams just define the needs of their data-intensive analytics apps without worrying about the assembly of the underlying infrastructure to host them. This patented innovation combines the benefits of containers, a native application-aware scale-out storage stack, and an application-aware workflow manager to provide an app-store like experience where you can deploy complex big data in minutes and manage their lifecycle through radically simplified 1-Click operations.
We will discuss the following best practices you should consider while containerizing HDP as well as all other big data systems, and the solution to achieve the same:
1. Focus on big data applications, not on the infrastructure
2. Maximize infrastructure utilization through consolidation
3. Decouple compute and storage, and scale either separately
4. Dynamically scale resources to meet growing demands
5. Share data between multiple Hadoop/spark clusters
6. Secure your deployments with multi-tenancy and RBAC (Role-based Access Control)