In our previous post, Next Generation Cybersecurity Analytics we wrote about an OpenSOC implementation project for a financial services firm (“Bank”). In this post we will go into more technical detail on each of the individual components, explain the data flow, and share our results and conclusions. In our final follow-on post we will make the case why a next generation cybersecurity analytics platform is required.

Network Packet Capture

The out-of-the-box utilities provided by the open source OpenSOC platform could not meet the scale of network packet capture from the Bank’s high performance hardware collection systems. As a result, B23 developed custom packet capture software utilities to address the Bank’s high performance hardware systems.

Peak collection capability for the solution equated to capturing approximately 1 Petabyte (“PB”) of data every 66 minutes from the Bank’s private datacenters. As a first step to enabling OpenSOC, the B23 team started work developing a parallelized software solution to keep up with the high throughput demands of full fidelity packet capture. Since the initial 1.2PB cluster was not scoped to handle such high throughout, the B23 team built throttling mechanisms into its custom-built, industrial-scale packet capture software utilities.

Apache Kafka

The custom network packet capture software asynchronously submitted the raw packet data to Apache Kafka, managed by Apache Ambari and monitored with an open-source Kafka Manager supported by Yahoo! Due to the dynamic nature of network traffic, Kafka provided the scalable, distributed queuing system required to handle peak ebbs and flows of spikey collected network packets.

Using the kafka-python library, we were able to submit asynchronous batches of packets using a simple, round-robin producer. As we increased throughput of packets, we observed that increasing the number of Kafka partitions per topic allowed for near linear scalability.

Apache Storm

Built into the OpenSOC-streaming package is an Apache Storm spout to pull the raw data from Kafka topics into a Storm user-defined topology for real-time analysis of network packets. B23 was able to customize and implement several Apache Storm bolts included with the OpenSOC solution’s PCAP topology including:

GeoIP Lookup

This bolt processed network packets and tagged every packet with geographical information including Country, State, ZIP code, et al. This critical processing bolt allowed analysts to quickly identify the source and destination countries of network flows using a world map that was highlighted in real-time.

Host Enrichment

This bolt processed every network packet and enriched packet metadata with known asset specific information associated with a packet such as the source machine’s function, the Line-of-Business owned the asset, etc.

Whitelist/Blacklist Alerts

This bolt processed every network packet and identified those packets that had a source or destination IP address either on a known whitelist or blacklist. These alerts flow into a separate terminal Storm bolt (HBase, HDFS, Elasticsearch) so that they can be more readily analyzed.

Hadoop Distributed File System (“HDFS”)

This bolt processed every network packet and sent packet binary data and packet metadata to HDFS for future machine learning analytics using Apache Spark.


This bolt processed every network packet and sent the packet binary data to Apache Hbase.


This bolt processed every network packet sent indexed packet metadata to Elasticsearch for further processing.

OpenSOC Storm topology implemented for the Bank:


Apache Hbase

A terminal Storm bolt persisted the raw binary packet data in Apache Hbase. A Java RESTFul API in OpenSOC enables real-time access to this packet binary data for use by the customized Elasticsearch/Kibana user interface written in AngularJS. This interface allows security analysts to interrogate targeted network packet data using Wireshark, either in the native OpenSOC-UI interface (shown below), or via a locally installed version of Wireshark.


Apache Hadoop

A terminal Storm bolt persisted enriched packet data to HDFS for downstream bulk analysis. HDFS and its associated data will be the primary repository for future analytics using Apache Spark.


Apache Ambari

Ambari managed most of the infrastructure in the threat analytics platform. It served as an operational dashboard to gauge the health of the software components including Kafka, Storm, HDFS, and Hbase.


A terminal Storm bolt indexed enriched packet data and alerts to Elasticsearch, providing real-time faceted search and visualization for analysis of network flows. Elasticsearch is the foundational element for the dashboards representing collected data using multiple versions of Kibana.



The OpenSOC distribution includes a customized version of Kibana 3, which serves as the primary visualization component for the Bank’s security analysts.

The Kibana component visualizes geo-tagged data in a world map, time series data in histograms, and faceted data in pie charts and tables. In real-time, the map would automatically update as network packets reached terminal bolts from Apache Storm, allowing security analysts to quickly identify and triage suspicious behavior. The screenshot below shows how specific network packet flows are searched, filtered, and interrogated with Wireshark.

The Bank’s security analysts also used a Kibana 4 instance for the creation of custom visualizations.


Results and Conclusions

Within one (1) minute of enabling geospatial visualization in Kibana, the Bank’s security analysts identified their first suspicions activity. The dashboard alerted that team to previously unknown BitTorrent network traffic with an internal source IP address and flowing to a foreign destination IP address perceived as an immediate risk. Over the course of several days, subsequent suspicious behavior was identified and remedied by the Bank’s security team.

The customizable threat analytics platform provided immediate value to the Bank. By the end of the first day of operation, the Bank’s security analysts were already developing custom visualizations and reports using Kibana 4. The B23 team committed all custom code and configuration to an internal GitLab repository. This allowed the Bank software developers to easily understand and evolve those customizations. Furthermore, the repository commit messages and README files served as thorough, itemized documentation of all software changes.

It was apparent to all parties that the default OpenSOC configuration was only the first step on a longer journey. This first step was critical to understand the technical and business implications of future activities. Embracing an open source set of capabilities allowed the Bank to avoid traditional vendor lock-in with tools that seemed initially inexpensive, but quickly accumulated additional fees as data grew. Open source capabilities allowed the bank’s internal staff to find example code and configurations in the public domain, and use that knowledge to develop custom capabilities that best suite their environment.

Perhaps the most important future component of the Bank’s threat analytics platform is the implementation of machine learning algorithms using the MLlib library within Apache Spark. The foundation now exists to store months of raw packet capture and enriched metadata in HDFS, which Spark will consume as the basis for training machine learning and statistical models to further identify targeted network anomalies. This effort will quickly take the Bank beyond the realm of the existing OpenSOC solution while still leveraging its overall value as the first step in its journey to a more secure enterprise.