A Birds of a Feather(BOF) is an informal discussion group. DataWorks will sponsor several Birds of Feather (BoFs) meeting groups, hosted by Apache Committers, architects, tech-leads, and engineers.  Attendees group together based on a shared interest and carry out discussions without any pre-planned agenda. These groups will have hosts that will moderate the discussion.

Come to join the discussion and share your experiences, challenges, future interests, and requirements on key Apache and other open source projects and discuss what’s on the roadmap and future design options.

Date: Friday, November 9th
Room: Check agenda or check the DataWorks Summit Mobile App

Apache Hadoop – HDFS

Apache Hadoop keeps evolving to meet the community demands around distributing computing and storage.  Apache Hadoop has just released 3.0 and quickly followed by 3.1 with key enhancements to YARN and HDFS.

Apache Hadoop HDFS is a distributed Java-based file system for storing large volumes of data. Come learn and discuss the latest HDFS and Ozone innovations and future directions.

Apache Hadoop – YARN

Apache Hadoop keeps evolving to meet the community demands around distributing computing and storage.  Apache Hadoop has just released 3.1.x and quickly followed by 3.2.0 with key enhancements to YARN and HDFS.

Apache Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform, unlocking an entirely new approach to analytics. Come learn and discuss the latest YARN innovations and future directions.

Apache HBase & Apache Phoenix

Apache HBase is the NoSQL store that runs on Apache Hadoop.  Apache Phoenix provides a SQL skin on top of HBase.

Come learn and discuss HBase 2.0 along with the latest Phoenix developments in Phoenix 5.0.

Apache Hive & Apache Druid

Apache Hive is the de facto standard for SQL queries in Hadoop. With the next phase of SQL in Hadoop, the Apache community has greatly improved Hive’s speed (LLAP), scale and SQL semantics.  Come learn and discuss what is new in Hive 3.0.

Apache Druid is an open source column-oriented distributed data store designed for OLAP queries on event data. Druid provides the ability to have interactive queries on real-time streams that are horizontally scalable. Druid has rich client libraries and integration with tools like Pivot and Apache Superset. Come learn about the latest developments in Druid and Hive/Druid integration.

Cloud & Operations

Cloud & Operations

Apache Ambari and Cloudbreak provide the foundation for Hadoop and Streaming platform installs, configurations and management on-premise and in the cloud. Come learn about the latest innovations and discuss Hadoop & Streaming platform operations and future directions.

Data Engineering & Data Science

Come learn and discuss the latest innovations and future direction in Apache Spark, Apache Zeppelin, and other ecosystem tools for Data Engineering and Data Science.


Hosts: Robert Hryniewicz

IoT, Streaming & Data Flow

Real-time data processing with Apache NiFi, Apache Kafka, Apache Storm, Apache Spark Streaming and many more provide the foundation for data processing in IoT. Come learn and discuss the latest streaming & data flow innovations and future directions.