This talk is about building Audi's big data platform from a first Hadoop PoC to a multi-tenant enterprise platform. Why a big data platform at all? We explain the requirements that drove the development of this platform and explain the decisions we had to make during this journey.
During the process of setting up our big data infrastructure, we often had to find the right balance between going for enterprise integration versus speed. For instance, whether to use the existing Active Directory for both LDAP and KDC versus setting up our own KDC. Using a shared enterprise service like Active Directory requires to follow certain naming conventions and restricted access, where running our own KDC brings much more flexibility but also adds another component to maintain to our platform. We show the advantages and disadvantages and explain why we've decided to choose a certain approach.
For data ingestion of both batch and streaming data, we use Apache Kafka. We explain why we installed a separate Kafka cluster from our Hadoop platform. We discuss the pros and cons of using the Kafka binary protocol and the HTTP REST protocol not only from a technical perspective but also from the organisational perspective as the source systems are required to push data into Kafka.
We give an overview of our current architecture including how some use cases are implemented on it. Some of them run exclusively on our new big data stack, while others use it in conjunction with our data warehouse. The use cases cover all different kinds of data from sensory data of robots in our plants to click streams from web applications.
Building an enterprise platform does not only consist of technical tasks but also of organizational tasks: data ownership, authorization to access certain data sets, or more financial ones like internal pricing and SLAs.
Although we have already achieved quite a lot, our journey has not yet ended. There are still some open topics to address, like providing a unified logging solution for applications spanning multiple platforms. Or finally offering a notebook-like Zeppelin to our analysts. Or addressing legal issues like GDPR.
We will conclude our talk with a short glimpse into our ongoing extension of our on-premises platform into a hybrid cloud platform.