Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

Thursday, May 23
2:50 PM - 3:30 PM
Marquis Salon 12

In the current digital world, Enterprises are drowning under the weight of data that are required to store for customers, for corporate analysis, and for the business forecast. With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations which are proven to be cost-effective, self-service with elastic in nature. This enterprise data is spread widely across numerous clusters and repositories residing in both the companies data centers and multiple cloud locations posing a new “data protection” problem in hybrid environments. Protecting data is very critical as part of every business continuity plan because data loss or corruption may have a huge impact on enterprise survival. Protecting data is more challenging than ever in a complex hybrid enterprise data lake environments since we need to answer questions such as

- How do we move data seamlessly between enterprise data centers and cloud?
- How to secure enterprise data that resides in different locations with multiple authorization policies?
- How do we protect data from natural or accidental disasters to ensure operational continuity?

Not having immediate answers to these questions makes it very difficult for business users and platform operators to do their jobs in protecting data in hybrid enterprise data lake environments. Therefore enterprises require a unified data protection orchestration platform which seamlessly protects the data across multiple environments. In this talk, we will address the above challenges faced by enterprises using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.

We will outline using a unified open source orchestration platform how,
- You can protect mission-critical data along with their security and governance policies across multiple data lakes and change data capture works using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.
- You can monitor replication jobs and metric collections associated with the replicated data across hybrid enterprise data lake environments.

We will also showcase,
- How to seamlessly replicate HDFS data, Hive databases between Hortonworks clusters securely along with Apache Ranger policies and Apache Atlas metadata.
- How to securely move the data between on-premise clusters and cloud storages.


Murali Ramasami
Senior Software Engineer
Hortonworks, Inc
I am currently working for Hortonworks as Senior Software Engineer focused on data management products. Actively contributing to the Hortonworks DataPlane Services platform and Hortonworks Data Lifecycle Manager. Prior to Hortonworks, I worked at Informatica in the Intelligent data warehouse and big data platform using Hadoop, Hive, and Teradata connectors. Prior to Informatica, I worked at Teradata in Data Movement products such as Teradata Parallel Transporter and Teradata connector for Hadoop.
Niru Aniseti
Hortonworks, Inc.
Niru Anisetti is the product manager for Data Lifecycle Manager at Hortonworks. She is part of a passionate team building the next generation disaster recovery product to make millions of data managers’ lives easier. Before Hortonworks, she worked at IBM, Intuit and Yahoo among other companies to build products to not only generate revenues but to change lives of people for the better. She can be reached at