At ING Bank, machine learning models are a key factor in making relevant engagements with our customers, empowering them to stay a step ahead in life and in business. In our efforts to make the model building process more rapid, compliant, validated and accessible to roles other than data scientists (such as data analysts or customer journey experts), we have structured it for an easy creation of propensity models.
In this talk, I will present this structure, focusing on pipelining data science models in Apache Spark. In particular, I will show how we use Apache Sqoop & Ranger to comply with GDPR, build a data science workflow on top of python and Jupyter, extend the SparkML libraries on PySpark to create custom standardizers and cross-validators, and show an in-house developed monitoring tool built on top of Elasticsearch for model evaluation.
Finally, I will describe the type of engagement analysts and customer journey experts have with the result set of the models created, and how we refine our dashboards (in IBM Cognos) accordingly.