Building the AI Engine for Retail in the New Era

Wednesday, May 22
11:50 AM - 12:30 PM
Marquis Salon 12

Global Market Insights forecasts the retail analytics market to surpass USD 13 billion by 2024. Traditional retailers are facing great competitive threat from online rivals. As a result, the retail industry is moving towards leveraging deep data analytics and AI to revolutionize their decision-making process. As a global leader in e-commerce and technology, Alibaba has been driving the emerging trend called “New Retail”, whose core concept centers around creating a customer experience by unifying online and offline behavior and data-driven operation. As one can imagine, the “New Retail” model creates huge amount of spatial-temporal data (e.g., user behavior, logistic trajectory, transactions). Inside Alibaba Group, TSDB is the backbone service for hosting all these data to enable high-concurrency storage and low-latency query, meanwhile provides intelligent analysis capability using AI and other data science technologies. So far, the TSDB service scales to thousands of physical nodes and deliver peak performance at 80 million operations per second.

In this talk, we focus on sharing the design of the Intelligence Engine on Alibaba TSDB service that enables fast and complex analytics of large-scale retail data. We will also demonstrate our work through a successful case study, where we deploy this system to support the Fresh Hema Supermarket, a major “New Retail” platform operated by Alibaba Group. We will highlight our solutions to the major technical challenges in data cleaning, storage and processing. Handling missing data is a key challenge in retail: For example, a missing store data point on a specific day could be caused by data transmission errors or actual store closure due to distinct reasons such as holidays, renovations, and natural disasters. How to treat such data gaps can profoundly impact the analytics results. The data cleaning module in the TSDB Intelligence Engine runs machine learning algorithms across multiple data sources to accurately diagnose the cause of missing data and automatically performs smart null-filling operations that are aligned with business expectations.

TSDB also performs a multitude of optimizations to enable fast access and computation at runtime. For example, retail analytics applications frequently deal with data aggregations across different product hierarchies, hierarchical geographic organizations, and timelines. With our customized optimization techniques, the pre-aggregation module in TSDB runs concurrent multi-level roll-ups on hundreds of financial sources along different temporal and spatial dimensions.

Another major analytical challenge in retail big data applications is the low signal-to-noise ratio: The net profit margin of leading retailers typically ranges from 1-3%, but the financial KPIs are influenced by numerous micro- and macro-economic factors. TSDB leverages a rich set of advanced time-series feature-extraction algorithms to quantify the true impact of business actions in the sea of noise. We also developed deep learning functions in the Intelligence Engine to automatically detect interesting trends in the real-time data streams and provide actionable insights.

With all the features above, the Intelligence Engine in TSDB provides a full-stack analytics solution to help retail companies identify interesting patterns from the most fine-grained data sources and achieve higher ROI by leveraging detailed closed-loop decision feedback in real time. We believe both technical and business audiences will be able to learn valuable experiences and insights from our success story.


Staff Algorithm Expert
Alibaba Group
Data science expert and software system architect with expertise in machine-learning and big-data systems. Rich experiences of leading innovation projects and R&D activities to promote data science best practice within large organizations. Deep domain knowledge on various vertical use cases (e.g., Finance, Telco, Healthcare). Currently working pushing the cutting-edge application of AI at the intersection of high-performance database and IoT, focusing on unleashing the value of spatial-temporal data. I am also a frequent speaker at various technology conferences, including: O’Reilly Strata AI Conference, NVidia GPU Technology Conference, Hadoop Summit, DataWorks Summit, Amazon AWS re:Invent, Global Big Data Conference, Global AI Conference, World IoT Expo, Intel Partner Summit, presenting keynote talks and sharing technology leadership thoughts. Received my Ph.D. from the Department of Computer and Information Science (CIS), University of Pennsylvania, under the advisory of Professor Insup Lee (ACM Fellow, IEEE Fellow). Published and presented research paper and posters at many top-tier conferences and journals, including: ACM Computing Surveys, ACSAC, CEAS, EuroSec, FGCS, HiCoNS, HSCC, IEEE Systems Journal, MASHUPS, PST, SSS, TRUST, and WiVeC. Served as reviewers for many highly reputable international journals and conferences.
Sanjian Chen