Global Market Insights forecasts the retail analytics market to surpass USD 13 billion by 2024. Traditional retailers are facing great competitive threat from online rivals. As a result, the retail industry is moving towards leveraging deep data analytics and AI to revolutionize their decision-making process. As a global leader in e-commerce and technology, Alibaba has been driving the emerging trend called “New Retail”, whose core concept centers around creating a customer experience by unifying online and offline behavior and data-driven operation. As one can imagine, the “New Retail” model creates huge amount of spatial-temporal data (e.g., user behavior, logistic trajectory, transactions). Inside Alibaba Group, TSDB is the backbone service for hosting all these data to enable high-concurrency storage and low-latency query, meanwhile provides intelligent analysis capability using AI and other data science technologies. So far, the TSDB service scales to thousands of physical nodes and deliver peak performance at 80 million operations per second.
In this talk, we focus on sharing the design of the Intelligence Engine on Alibaba TSDB service that enables fast and complex analytics of large-scale retail data. We will also demonstrate our work through a successful case study, where we deploy this system to support the Fresh Hema Supermarket, a major “New Retail” platform operated by Alibaba Group. We will highlight our solutions to the major technical challenges in data cleaning, storage and processing. Handling missing data is a key challenge in retail: For example, a missing store data point on a specific day could be caused by data transmission errors or actual store closure due to distinct reasons such as holidays, renovations, and natural disasters. How to treat such data gaps can profoundly impact the analytics results. The data cleaning module in the TSDB Intelligence Engine runs machine learning algorithms across multiple data sources to accurately diagnose the cause of missing data and automatically performs smart null-filling operations that are aligned with business expectations.
TSDB also performs a multitude of optimizations to enable fast access and computation at runtime. For example, retail analytics applications frequently deal with data aggregations across different product hierarchies, hierarchical geographic organizations, and timelines. With our customized optimization techniques, the pre-aggregation module in TSDB runs concurrent multi-level roll-ups on hundreds of financial sources along different temporal and spatial dimensions.
Another major analytical challenge in retail big data applications is the low signal-to-noise ratio: The net profit margin of leading retailers typically ranges from 1-3%, but the financial KPIs are influenced by numerous micro- and macro-economic factors. TSDB leverages a rich set of advanced time-series feature-extraction algorithms to quantify the true impact of business actions in the sea of noise. We also developed deep learning functions in the Intelligence Engine to automatically detect interesting trends in the real-time data streams and provide actionable insights.
With all the features above, the Intelligence Engine in TSDB provides a full-stack analytics solution to help retail companies identify interesting patterns from the most fine-grained data sources and achieve higher ROI by leveraging detailed closed-loop decision feedback in real time. We believe both technical and business audiences will be able to learn valuable experiences and insights from our success story.