Data Gloveboxes: A Philosophy of Data Science Data Security

Thursday, March 21
11:00 AM - 11:40 AM
Room 131-132

Data Scientists often have access to very sensitive material: data! Today's data scientists need a way to interact with toxic data where spilling more than a few data could be destructive to a company. Securing compute clusters to be like nuclear glove boxes of old is one technique to limit data exfiltration and ensure data production is regularized, reliable and secure.

This talk will cover the philosophy and implementation of:

Data Dropbox: data goes in blindly but can be verified via checksums - data directionality is enforced; using HDFS is a model and the state of HBase is discussed.
Data Glovebox: one can manipulate data as desired but can not exfiltrate except via very specific, controlled processes; the Oozie Git action is a step in this direction.

Clay Baenziger
Hadoop Infrastructure
Clay Baenziger - is an architect of the Hadoop Infrastructure Team at Bloomberg. Clay comes from a diverse background in systems infrastructure and analytics ranging from operating systems engineering to financial portfolio analytics. He has been involved in the Hadoop ecosystem for nine years and provides numerous talks each year on Bloomberg's community contributions.