Machine translation is important when having to cater to different geographies and locales for news or eCommerce website content. Machine translation systems often need to handle a large volume of concurrent translation requests from multiple sources in multiple languages. They have to do this in real time while making efficient use of specialized hardware.
Many machine translation preprocessing tasks like text normalization, language detection, sentence segmentation, etc. can be performed at scale in a real-time streaming pipeline utilizing Apache Flink or Apache Storm. We will be looking at a few such streaming pipelines leveraging Apache OpenNLP components. These components will preprocess data into a format that can be consumed by a neural machine translation library like Sockeye, which is based on the Apache MXNet deep learning framework.
We'll demonstrate and examine the end-to-end throughput and latency of a pipeline that detects language and translates news articles shared via twitter in real time. Developers will come away with a better understanding of how neural machine translation works and how to build pipelines for machine translation preprocessing tasks and neural machine translation models. They’ll have access to a demo repository to experiment with and will build machine translation models themselves.