Announcing Serverless Data Flows using B23 Kubernetes Operator for Nifi-Fn

Apache NiFi provides a large and diverse library of processors for acquiring and transforming data, and a flow registry for versioning these often complex data flows. B23 uses NiFi across multiple infrastructure and orchestration platforms, including Kubernetes. Our pioneering NiFi engineering work allows us to programmatically provision data flows using a library of pre-existing, best-in-class, data flows that we have developed and honed after many years of operational use. Thanks to the recent development work on the NiFi-Fn project by Sam Hjelmfelt at Cloudera, there is now a direct path to running NiFi Flows directly on Kubernetes without the need and overhead of an administratively complex NiFi cluster. The NiFi-Fn project brings to NiFi the ability to execute pre-existing data flows as serverless applications. What this means NiFi users is that flows can now be started on-demand and run to completion with success determined by the successful processing of all flow files sent as inputs to the flow. The B23 Kubernetes Operator for NiFi-Fn offloads the job management of specific flows to Kubernetes. An inspiration for our work was the recently open sourced Kubernetes Operator for Apache Spark released by Google. At B23, we choose to run container-based workloads on Google Kubernetes Engine (“GKE”). For us, it was important to manage our NiFi flows as first-class citizens in Kubernetes by utilizing Custom Resource Definitions (“CRD’s”). This allows us to create and manage our data flows just like any other resource in Kubernetes. Using familiar commands like `kubectl create -f nififn-flow.yaml` or `kubectl get NiFiFn` we can create new flows and list running or completed data flows. The operator will handle the creation of a Kubernetes Job resource and execute the desired flow after pulling it from the registry. This lets Kubernetes handle the semantics of retry and cleanup while giving the user control over the logic and execution of the flow. The B23 Kubernetes Operator for NiFi-Fn is open source and Apache licensed. It has been...

B23 Highlighted at Jeffries’s Battlefin Conference

Having just returned from yet another extremely productive Jefferies’s BattleFin conference in Miami, our team was reflecting on several themes we observed occurring in the financial services and hedge fund artificial intelligence (“AI”) market.  With 470 attendees, the Jefferies BattleFin conference is the premier event to observe, hear, and meet with experts related to alternative data and AI for hedge funds.  The conference itself continues to grow in size, and in the diversity of data providers and technology services relevant to the technology-driven investor.  We were excited to participate in a panel discussion yet again this year about a topic we have a high degree of conviction and experience, and it was extremely productive to meet with so many new and familiar industry experts.   An overview of these themes from this year’s event include: Increased acceptance of Data-Engineering-as-a-Service using qualified third-party technology partners like B23 Machine Learning at-scale is becoming more tightly coupled to Public Cloud infrastructure More pragmatism around the amount of alpha that alternative data can provide by itself Challenges to adopting new, innovative ideas with so much turnover and cross-pollination   Data-Engineering-as-a-Service for Hedge Funds An emerging theme that was very prevalent this year was the acceptance of outsourced data engineering or Data-Engineering-as-a-Service.  Many of the institutions we spoke with are quickly aligning themselves to this trend, which is consistent with our observations also occurring in non-financial services verticals as well.   It was obvious that more and more institutions continue to pursue a strategy to outsource the “undifferentiated heaving lifting” of data engineering in order for those same firms to focus on higher order outcomes with respect to quantitative investment analysis. Funds are increasing passing on building themselves cloud-based data lakes, or developing durable and performant extract, transform, and load (“ETL”) applications hosted on...

Exploring Credit Default Swap (CDS) Market Data Using Modern Data Science Techniques

October 2nd, 2018 Title VII of Dodd-Frank Wall Street Reform and Consumer Protection Act addresses the gap in U.S. financial regulation of OTC swaps by providing a comprehensive framework for the regulation of the OTC swaps markets.  The objective of this blog is to describe how to rapidly and securely analyze credit default swap (“CDS”) transaction data using cloud computing and advanced machine learning (“ML”) techniques.  We obtained the CDS data from the Depository Trust and Clearing Corporation (“DTCC”).   A fundamental technology enabler for our customers is the B23 Data Platform which is a Cloud-based artificial intelligence (“AI”) engine to discover, transform, and synthesize data from a variety of sources to provide unique and predictive insights.   The B23 Data Platform is used by data-centric enterprises in many different industries including technology companies, government agencies, and financial institutions to securely use the Amazon Cloud to gain insight from very large data sets. Scope and Accomplishments of our efforts include: Created secure Machine Learning (“ML”) Analysis Cluster in representative Customer Private Cloud Ingested 5 years of CDS data in into the ML analytics cluster in 1 minute Identified anomalous CDS trading activities for complex market transactions Created established CDS compliance reporting metrics Identified CDS market characteristics at individual products and individual series levels     Identify Anomalous Transaction Activity or Faulty Reporting in Markets B23 investigated several transactions in the DTCC CDS data that looked peculiar.  Figure-2 shows a relatively small number of transactions composed of multiple line items in the data set that exhibited anomalous market activities. In the example above, “yellow” nodes are correction activities, and “red” nodes are cancellation activities. Walking through each set of transactions, the following activities are occurring: A single contract is created for a value of $52M (blue node labeled “new”) The $52M original CDS is corrected to a...