Comcast’s Streaming Data platform comprises ingest, transformation, and storage services in the public cloud, and on-prem RDBMS’s, EDW’s, and a large, ungoverned legacy data lake. We use Apache Atlas for data discovery and lineage, relying heavily on its unique-to-the-industry extensibility. First we tackled the public cloud, including kafka topics, avro schemas and S3 datasets. Next we integrated metadata and lineage for the on-prem datasets. More recently we added data-based ML approaches to duplicate elimination and discovery of semantic equivalences. These are aimed primarily at taming the chaos of the legacy data lake, and finding connections between that data lake and the EDW. We use Atlas/Ranger for tag-based authorization not only in the Hadoop environment, but also in AWS S3, Presto, and other public cloud-based applications. We have built API’s to make it very easy for other groups within Comcast to push metadata and lineage to Atlas, removing our group as the bottleneck. All the extensions to Atlas type definitions have been contributed to the Apache Open Source community.