Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

Thursday, June 21
9:30 AM - 10:10 AM
Grand Ballroom 220A

This presentation will provide technical design and development insights to run a secured Spark job in Kubernetes compute cluster that accesses job data from a Kerberized HDFS cluster. Joy will show how to run a long-running machine learning or ETL Spark job in Kubernetes and to access data from HDFS using Kerberos Principal and Delegation token.

The first part of this presentation will unleash the design and best practices to deploy and run Spark in Kubernetes integrated with HDFS that creates on-demand multi-node Spark cluster during job submission, installing/resolving software dependencies (packages), executing/monitoring the workload, and finally disposing the resources at the end of job completion. The second part of this presentation covers the design and development details to setup a Spark+Kubernetes cluster that supports long-running jobs accessing data from secured HDFS storage by creating and renewing Kerberos delegation tokens seamlessly from end-user's Kerberos Principal.

All the techniques covered in this presentation are essential in order to set up a Spark+Kubernetes compute cluster that accesses data securely from distributed storage cluster such as HDFS in a corporate environment. No prior knowledge of any of these technologies is required to attend this presentation.

Presentation Video


Joy Chakraborty
Data Architect
Joy is a Distributed System Architect with 19+ yrs of Software design and development experience, 10+ yrs of Java/Scala development experience, 7+ yrs of work experience in Big-Data and Hadoop technologies, 5+ yrs of Apache Spark experience with a special interest in distributed/parallel computing, currently working on Kubernetes, Cloud and Big Data technologies. Joy is an open-source contributor for Hadoop and Jupyter Notebook echo-system products and technologies. Also, he is actively part of various Software architectural organization. Joy is a frequent speaker in various conferences, user-groups and code-camps.