With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations. Enterprises increasingly have their data spread across multiple data lakes in many geographies and across multiple cloud platforms, for example, due to regulatory and compliance mandates that limit cross-border data transfer such as GDPR. With the proliferation of data types and sources in this complex landscape, the process of discovery, organization, and curation of data has become extremely expensive. Additionally, gaining global visibility into the business context, usage, and trustworthiness of data requires a centralized view of all data and metadata, security controls, data access, and monitoring. All of these challenges create a significant chasm between initial data capture and subsequent data insights generation to drive value creation. Providing adequate stewardship with the right set of rules and policies around data security and privacy as well as rational policy enforcement across the information supply chain is critical to adoption of modern data lake architectures and value creation. Therefore, enterprises now require a “global insight fabric” that can find a happy medium between adequate rules and policies of data governance while providing a trusted environment for users to collaborate and share data responsibly in order to create value. We recently launched 100% open source Hortonworks Data Steward Studio (DSS) service that can help enterprises address these challenges and move them closer to realizing the vision of a global insight fabric.
In this talk, we will outline how data stewards, analysts, and data engineers can better understand their data assets across multiple data lakes at scale using DISCOVER approach with DSS:
Detect: Find where important data assets are located
Inventory: Locate and catalog all data globally
Secure: Protect data assets and monitor their access and usage
Collaborate: Crowdsource and leverage knowledge across the enterprise
Organize: Curate and group data based on different characteristics
Verify: Understand sources and complete chain of custody for all data (lineage and impact)
Enrich: Add classifications and annotations
Report: Create and view multiple dashboards, reports, and summarizations of data
We will showcase how DSS empowers enterprises to precisely identify and evaluate trust levels of their data, to securely collaborate, and to confidently democratize data across the enterprise in order to derive value from the data in their data lakes – whether these data lakes are located in on-premise data centers or in the cloud or across multiple cloud provider environments.