DISCOVER with Data Steward Studio: Understanding and unlocking the value of data in hybrid enterprise data lake environments

DISCOVER with Data Steward Studio: Understanding and unlocking the value of data in hybrid enterprise data lake environments

Tuesday, June 19
4:00 PM - 4:40 PM
Meeting Room 230C

With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations. Enterprises increasingly have their data spread across multiple data lakes in many geographies and across multiple cloud platforms, for example, due to regulatory and compliance mandates that limit cross-border data transfer such as GDPR. With the proliferation of data types and sources in this complex landscape, the process of discovery, organization, and curation of data has become extremely expensive. Additionally, gaining global visibility into the business context, usage, and trustworthiness of data requires a centralized view of all data and metadata, security controls, data access, and monitoring. All of these challenges create a significant chasm between initial data capture and subsequent data insights generation to drive value creation. Providing adequate stewardship with the right set of rules and policies around data security and privacy as well as rational policy enforcement across the information supply chain is critical to adoption of modern data lake architectures and value creation. Therefore, enterprises now require a “global insight fabric” that can find a happy medium between adequate rules and policies of data governance while providing a trusted environment for users to collaborate and share data responsibly in order to create value. We recently launched 100% open source Hortonworks Data Steward Studio (DSS) service that can help enterprises address these challenges and move them closer to realizing the vision of a global insight fabric.

In this talk, we will outline how data stewards, analysts, and data engineers can better understand their data assets across multiple data lakes at scale using DISCOVER approach with DSS:
Detect: Find where important data assets are located
Inventory: Locate and catalog all data globally
Secure: Protect data assets and monitor their access and usage
Collaborate: Crowdsource and leverage knowledge across the enterprise
Organize: Curate and group data based on different characteristics
Verify: Understand sources and complete chain of custody for all data (lineage and impact)
Enrich: Add classifications and annotations
Report: Create and view multiple dashboards, reports, and summarizations of data

We will showcase how DSS empowers enterprises to precisely identify and evaluate trust levels of their data, to securely collaborate, and to confidently democratize data across the enterprise in order to derive value from the data in their data lakes – whether these data lakes are located in on-premise data centers or in the cloud or across multiple cloud provider environments.


Hortonworks, Inc.
スリカンス・ベンカットは、現在、HortonworksにてApache Knox、Apache Ranger、Apache Atlas、プラットフォーム ワイド セキュリティ、Hortonworks DataPlane Serviceを含む、製品のセキュリティ&ガバナンスのポートフォリオに携わっています。Hortonworksに入社する以前は、クラウドサービス、市場、セキュリティ、ビジネスアプリケーションなどの分野で様々な職務の経験があります。スリカンスは、製品管理から、戦略および運営、テクニカルアーキテクチャまで様々な分野でリーダーシップの経験があり、TelefonicaやSalesforce、Cisco-Webex、Proofpoint、Dataguise、Trilogy Software、Hewlett-Packardを含む、新興企業からグローバル企業まで広範囲の職務経験を持ちます。スリカンスは、ピッツバーグ大学で人工知能に焦点を置いたエンジニアリングの博士号、インディアナ大学でGeneral ManagementのMBA、サンダーバード国際経営大学院にてグローバルマネジメントの修士号を取得しています。趣味はデータサイエンスと機械学習で、ビッグデータテクノロジーを触ることを楽しんでいます。
Hemanth Yamijala
Principal Engineer
Hortonworks, Inc.
I am a Principal Engineer working at Hortonworks, focussed on governance and metadata management products. I lead the Hortonworks DataPlane Services platform and Hortonworks Data Steward Studio. Earlier, I was an active contributor and committer on Apache Atlas. I am interested in building scalable data processing systems and metadata management systems that operate in the Apache Hadoop ecosystem. I have been involved with Apache Hadoop since early days, and was a lead responsible for MapReduce before Hadoop graduated to become a 1.0 product.