Doing advanced analytics in streaming architectures presents unique challenges around the tradeoff of having more context vs. performance. Typically performance and scalability requirements mandate that each message in a stream be operated on without the context of other messages in the stream that may have come before. In this talk, we will talk about using sketching algorithms to engineering a compromise which allows us to consider historical state without compromising scalability.
What we found analyzing the capabilities of many similar SIEMs and cybersecurity platforms is that a good portion of the advanced anaytics boil down to either simple rules enriched with the ability to do statistical baselining, set existence, and set cardinality computations. These operations are necessarily difficult to do in-stream, so often they're done after the fact. We look at ways to open up these analytics to stream computation without sacrificing scalability.
Specifically, we will introduce the infrastructure built for Apache Metron to perform these kinds of tasks. We will cover the novel integration between an Apache Storm and Apache Hbase and orchestrated by a custom domain specific language called Stellar to take all the sting out of constructing sketches and using them to accomplish simple and more advanced analytics such as statistical outlier analysis in stream.