Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka

Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka

Thursday, March 21
2:50 PM - 3:30 PM
Room 127-128

At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences.
To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.

In this session, we will discuss how we continuously transform our data infrastructure to support these goals.

Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth.

We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty...).

We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services' costs.

Topics include :
* Kafka and Spark Streaming for stateless and stateful use-cases
* Spark Structured Streaming as a possible alternative
* Combining Spark Streaming with batch ETLs
* "Streaming" over Data Lake using Kafka

講演者

Itai Yaffe
Big Data Tech Lead
Nielsen
A Big Data Tech Lead at the Nielsen Marketing Cloud. I have been dealing with Big Data challenges for the past 6 years, using tools like Spark, Druid, Kafka, and others. I'm keen about sharing my knowledge and have presented my real-life experience in various forums in the past (e.g meetups, conferences, etc.).