Santander UK’s Big Data journey began in 2014, using Hadoop to make the most of our data and generate value for customers. Within 9 months, we created a highly available real-time customer facing application for customer analytics. We currently have 500 different people doing their own analysis and projects with this data, spanning a total of 50 different use cases. This data, (consisting of over 40 million customer records with billions of transactions), provides our business new insights that were inaccessible before.
Our business moves quickly, with several products and 20 use cases currently in production. We currently have a customer data lake and a technical data lake. Having a platform with very different workloads has proven to be challenging.
Our success in generating value created such growth in terms of data, use cases, analysts and usage patterns that 3 years later we find issues with scalability in HDFS, Hive metastore and Hadoop operations and challenges with highly available architectures with Hbase, Flume and Kafka. Going forward we are exploring alternative architectures including a hybrid cloud model, and moving towards streaming.
Our goal with this session is to assist people in the early part of their journey by building a solid foundation. We hope that others can benefit from us sharing our experiences and lessons learned during our journey.