top of page
  • Writer's picturebrad058

Next Generation Cybersecurity Analytics – Part II, Technical Overview

In our previous post, Next Generation Cybersecurity Analytics we wrote about an OpenSOC implementation project for a financial services firm (“Bank”). In this post we will go into more technical detail on each of the individual components, explain the data flow, and share our results and conclusions. In our final follow-on post we will make the case why a next generation cybersecurity analytics platform is required.

Network Packet Capture

The out-of-the-box utilities provided by the open source OpenSOC platform could not meet the scale of network packet capture from the Bank’s high performance hardware collection systems. As a result, B23 developed custom packet capture software utilities to address the Bank’s high performance hardware systems.

Apache Kafka

The custom network packet capture software asynchronously submitted the raw packet data to Apache Kafka, managed by Apache Ambari and monitored with an open-source Kafka Manager supported by Yahoo! Due to the dynamic nature of network traffic, Kafka provided the scalable, distributed queuing system required to handle peak ebbs and flows of spikey collected network packets.

Apache Storm

Built into the OpenSOC-streaming package is an Apache Storm spout to pull the raw data from Kafka topics into a Storm user-defined topology for real-time analysis of network packets. B23 was able to customize and implement several Apache Storm bolts included with the OpenSOC solution’s PCAP topology including:

GeoIP Lookup

This bolt processed network packets and tagged every packet with geographical information including Country, State, ZIP code, et al. This critical processing bolt allowed analysts to quickly identify the source and destination countries of network flows using a world map that was highlighted in real-time.

Host Enrichment

This bolt processed every network packet and enriched packet metadata with known asset specific information associated with a packet such as the source machine’s function, the Line-of-Business owned the asset, etc.

Whitelist/Blacklist Alerts

This bolt processed every network packet and identified those packets that had a source or destination IP address either on a known whitelist or blacklist. These alerts flow into a separate terminal Storm bolt (HBase, HDFS, Elasticsearch) so that they can be more readily analyzed.

Hadoop Distributed File System (“HDFS”)

This bolt processed every network packet and sent packet binary data and packet metadata to HDFS for future machine learning analytics using Apache Spark.


This bolt processed every network packet and sent the packet binary data to Apache Hbase.


This bolt processed every network packet sent indexed packet metadata to Elasticsearch for further processing.

OpenSOC Storm topology implemented for the Bank:

Apache Hbase

A terminal Storm bolt persisted the raw binary packet data in Apache Hbase. A Java RESTFul API in OpenSOC enables real-time access to this packet binary data for use by the customized Elasticsearch/Kibana user interface written in AngularJS. This interface allows security analysts to interrogate targeted network packet data using Wireshark, either in the native OpenSOC-UI interface (shown below), or via a locally installed version of Wireshark.

Apache Hadoop

A terminal Storm bolt persisted enriched packet data to HDFS for downstream bulk analysis. HDFS and its associated data will be the primary repository for future analytics using Apache Spark.

Apache Ambari

Ambari managed most of the infrastructure in the threat analytics platform. It served as an operational dashboard to gauge the health of the software components including Kafka, Storm, HDFS, and Hbase.


A terminal Storm bolt indexed enriched packet data and alerts to Elasticsearch, providing real-time faceted search and visualization for


The OpenSOC distribution includes a customized version of Kibana 3, which serves as the primary visualization component for the Bank’s security analysts.

The Bank’s security analysts also used a Kibana 4 instance for the creation of custom visualizations.

Results and Conclusions

Within one (1) minute of enabling geospatial visualization in Kibana, the Bank’s security analysts identified their first suspicions activity. The dashboard alerted that team to previously unknown BitTorrent network traffic with an internal source IP address and flowing to a foreign destination IP address perceived as an immediate risk. Over the course of several days, subsequent suspicious behavior was identified and remedied by the Bank’s security team.

The customizable threat analytics platform provided immediate value to the Bank. By the end of the first day of operation, the Bank’s security analysts were already developing custom visualizations and reports using Kibana 4. The B23 team committed all custom code and configuration to an internal GitLab repository. This allowed the Bank software developers to easily understand and evolve those customizations. Furthermore, the repository commit messages and README files served as thorough, itemized documentation of all software changes.

It was apparent to all parties that the default OpenSOC configuration was only the first step on a longer journey. This first step was critical to understand the technical and business implications of future activities. Embracing an open source set of capabilities allowed the Bank to avoid traditional vendor lock-in with tools that seemed initially inexpensive, but quickly accumulated additional fees as data grew. Open source capabilities allowed the bank’s internal staff to find example code and configurations in the public domain, and use that knowledge to develop custom capabilities that best suite their environment.

Perhaps the most important future component of the Bank’s threat analytics platform is the implementation of machine learning algorithms using the MLlib library within Apache Spark. The foundation now exists to store months of raw packet capture and enriched metadata in HDFS, which Spark will consume as the basis for training machine learning and statistical models to further identify targeted network anomalies. This effort will quickly take the Bank beyond the realm of the existing OpenSOC solution while still leveraging its overall value as the first step in its journey to a more secure enterprise.

#ArtificialIntelligence #BigData #CyberAnalytics #DataScience

3 views0 comments

Recent Posts

See All
Post: Blog2_Post
bottom of page