In the current digital world, Enterprises are drowning under the weight of data that are required to store for customers, for corporate analysis, and for the business forecast. With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations which are proven to be cost-effective, self-service with elastic in nature. This enterprise data is spread widely across numerous clusters and repositories residing in both the companies data centers and multiple cloud locations posing a new “data protection” problem in hybrid environments. Protecting data is very critical as part of every business continuity plan because data loss or corruption may have a huge impact on enterprise survival. Protecting data is more challenging than ever in a complex hybrid enterprise data lake environments since we need to answer questions such as
- How do we move data seamlessly between enterprise data centers and cloud?
- How to secure enterprise data that resides in different locations with multiple authorization policies?
- How do we protect data from natural or accidental disasters to ensure operational continuity?
Not having immediate answers to these questions makes it very difficult for business users and platform operators to do their jobs in protecting data in hybrid enterprise data lake environments. Therefore enterprises require a unified data protection orchestration platform which seamlessly protects the data across multiple environments. In this talk, we will address the above challenges faced by enterprises using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.
We will outline using a unified open source orchestration platform how,
- You can protect mission-critical data along with their security and governance policies across multiple data lakes and change data capture works using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas.
- You can monitor replication jobs and metric collections associated with the replicated data across hybrid enterprise data lake environments.
We will also showcase,
- How to seamlessly replicate HDFS data, Hive databases between Hortonworks clusters securely along with Apache Ranger policies and Apache Atlas metadata.
- How to securely move the data between on-premise clusters and cloud storages.