What’s new in Apache Spark 2.3

Wednesday, February 6
4:00 PM - 4:40 PM
Room 111/112

Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 has made similar strides too, introducing new features and resolving over 1300 JIRA issues. In this talk, we want to share with the community some salient aspects of Spark 2.3 features:
• New deployment mode: Kubernetes scheduler backend
• PySpark performance and enhancements
• New structured streaming execution engine: continuous processing
• Data source v2 APIs for both structured streaming and Spark SQL
• ML on structured streaming
• Image reader
• Stable codegen engine
• Spark History Server v2
• Native ORC support
• Vectorized ORC and SQL cache readers
• Stream-stream Join
• UDF enhancements
• Various SQL enhancements


Robert Hryniewicz
Technical Evangelist
Hortonworks, Inc.
Robert Hryniewicz has over 10 years working on various projects related to Artificial Intelligence, Enterprise Software, IoT, Robotics, Blockchain and more. Currently, he’s a Data Scientist and Evangelist at Hortonworks. Previously, Robert was a CTO at a Singularity Labs startup, Sr. Architect at Cisco, NASA et al. He’s a frequent speaker at DataWorks / Hadoop Summits.