Analyzing streams of text data to extract topics is an important task for getting useful insights to be leveraged in subsequent workflows. For example, extracting topics from text to be continuously ingested into a search engine can be useful to tag documents with important
keywords or concepts to be used at search time. Another use case is doing analysis of support tickets to get insights on the most common problems for customers.
In this talk we illustrate how to use Apache Flink's Dynamic processing and Stateful streaming capabilities to continuously train topic models from unlabelled text and use such models to extract topics from the data itself. Such topic models will be built leveraging distributed representations of words and documents. We’ll be seeing as to how this can be all done in a pure streaming fashion without having to resort to a Lambda Architecture kind’a setup. An earlier version of this talk was presented at Flink Forward Berlin 2018.