What’s new in Apache Spark 2.3

Wednesday, June 20
11:50 AM - 12:30 PM
Grand Ballroom 220B

Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.

Continuing forward in that spirit, the upcoming release of Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of soon-to-be-released Spark 2.3 features:

• New deployment mode: Kubernetes scheduler backend
• PySpark performance and enhancements
• New structured streaming execution engine: continuous processing
• Data source v2 APIs for both structured streaming and Spark SQL
• ML on structured streaming
• Image reader
• Stable codegen engine
• Spark History Server V2
• Native ORC support
• Vectorized ORC and SQL cache readers
• Stream-stream Join
• UDF enhancements
• Various SQL enhancements

Presentation Video


Xiao Li
Software Engineer
Xiao Li is a software engineer in Databricks. His main interests are on Spark SQL, data replication and data integration. Previously, he was an IBM master inventor and an expert on asynchronous database replication. He received his Ph.D. from University of Florida in 2011. He is a Spark committer and a Spark PMC member.
Wenchen Fan
Software Engineer
Wenchen Fan is a Software Engineer at Databricks, working on Spark Core and Spark SQL. He mainly focus on the open source community, helped to discuss and review many features/fixes in Spark. He is a Spark committer and a Spark PMC member.