Near Real-time Search Index Generation with Lambda Architecture and Spark Streaming at Walmart Scale

Near Real-time Search Index Generation with Lambda Architecture and Spark Streaming at Walmart Scale

Wednesday, March 20
2:50 PM - 3:30 PM
Room 129-130

Today Walmart offers many millions of products to purchase through its websites. All these products are managed in large scale product catalog which getting updated thousands of times per second. The changes include product information updates, new products, availability in stores and so many more different attributes. In quest of providing a seamless shopping experience for our customers, we developed a streaming indexing data pipeline which ensures that search index is getting updated on timely basis and always reflect latest state of product catalog in near real time. Our pipeline is a key component to ensure that our search data is always up-to-date and in sync with constantly changing product catalog and other features such as store and online availability, offers etc.

Our indexing component, which is based on Spark Streaming Receiver Approach, consumes events from multiple Kafka topics such as Product Change, Store Availability, and Offer Change and merges the transformed Product Attributes with the historical signals computed by relevance data pipeline stored in Cassandra. This data is further processed by another Streaming component, which partitions documents into Kafka topic for every shard as it can be indexed into Apache Solr for Product Search. Deployment of this pipeline is automated end to end.


Vladimir Kroz
Principal Engineer
Vladimir Kroz is an architect at Search group in WalmartLabs, where he is building next generation of e-commerce search for Vladimir works on large scale low latency search, big data and machine learning systems, and has acute passion in large scale computing and AI. Prior to Walmart he has led engineering teams at number of Fortune 500 international companies in e-commerce and telecom field. He also co-founded real-time data integration company Wisdomforce. Vladimir holds Master’s degree in Computer Information Systems and Electrical Engineering.