The importance of ingestion and processing streaming data in telecommunication industry is ever increasing. We, SK Telecom which is Korea's number-one telecommunications provider, encounter how to use infra resources more efficiently. Apache Druid supports auto scaling feature for data ingestion, but it is only available on AWS EC2. We cannot rely on the feature on our private cloud.
In this talk, we are going to introduce auto scale-out/in on Kubernetes. This approach is more outstanding than Druid's scaling implementation. Here are the benefits. The first is our approach can be used anywhere on private cloud or (managed) Kubernetes in Azure, AWS and GKE. The second is AWS EC2's startup and termination requires a few minutes, but our approach requires a few seconds. The last is the scaling mechanism is decoupled from Druid's source code. We will also share development of Druid Helm chart, rolling update, custom metric usage for horizontal auto scaling.
The below is about detailed benefit compared with Druid's auto scaling approach:
1. Druid's auto scaling is only available in AWS, but our approach does not have the obstacle. It can be used in Private cloud(on-premise) are (managed) Kubernetes in Azure, AWS and GKE.
2. AWS EC2 is an instance of virtual machine, so the startup is slower than docker container. A few minutes are required for startup or termination of EC2. Docker container is very lightweight, so it requires a few seconds.
3. Druid's auto scaling is tightly coupled with AWS API because Druid engine code uses AWS API. Our scale-out/in algorithm is conceptually equal to Druid's auto scaling approach, but we decoupled the dependency because Kubernetes communicate with one of dispatcher nodes(i.e. Overlord node) using REST API.