Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

Thursday, May 23
4:50 PM - 5:30 PM

In the current digital world, Enterprises are drowning under the weight of data that are required to store for customers, for corporate analysis, and for the business forecast. With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations which are proven to be cost-effective, self-service with elastic in nature. This enterprise data is spread widely across numerous clusters and repositories residing in both the companies data centers and multiple cloud locations posing a new “data protection” problem in hybrid environments. Protecting data is very critical as part of every business continuity plan because data loss or corruption may have a huge impact on enterprise survival. Protecting data is more challenging than ever in a complex hybrid enterprise data lake environments since we need to answer questions such as

- How do we move data seamlessly between enterprise data centers and cloud?
- How to secure enterprise data that resides in different locations with multiple authorization policies?
- How do we protect data from natural or accidental disasters to ensure operational continuity?

Not having immediate answers to these questions makes it very difficult for business users and platform operators to do their jobs in protecting data in hybrid enterprise data lake environments. Therefore enterprises require a unified data protection orchestration platform which seamlessly protects the data across multiple environments. In this talk, we will address the above challenges faced by enterprises using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.

We will outline using a unified open source orchestration platform how,
- You can protect mission-critical data along with their security and governance policies across multiple data lakes and change data capture works using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.
- You can monitor replication jobs and metric collections associated with the replicated data across hybrid enterprise data lake environments.

We will also showcase,
- How to seamlessly replicate HDFS data, Hive databases between Hortonworks clusters securely along with Apache Ranger policies and Apache Atlas metadata.
- How to securely move the data between on-premise clusters and cloud storages.


Murali Ramasami
Senior Software Engineer
Hortonworks, Inc
I am currently working for Hortonworks as Senior Software Engineer focused on data management products. Actively contributing to the Hortonworks DataPlane Services platform and Hortonworks Data Lifecycle Manager. Prior to Hortonworks, I worked at Informatica in the Intelligent data warehouse and big data platform using Hadoop, Hive, and Teradata connectors. Prior to Informatica, I worked at Teradata in Data Movement products such as Teradata Parallel Transporter and Teradata connector for Hadoop.
Niru Aniseti
Director of Product Management
American Express
Niru Anisetti is the Director of Product Management at American Express building a true Enterprise Hybrid cloud platform service for global customer base. In her previous roles, she worked at Hortonworks/Cloudera, IBM, Intuit and Yahoo among other companies to build products to not only generate revenues but to change lives of people for the better. She can be reached at