Announcing Serverless Data Flows using B23 Kubernetes Operator for Nifi-Fn
Updated: Aug 14, 2021
Apache NiFi provides a large and diverse library of processors for acquiring and transforming data, and a flow registry for versioning these often complex data flows. B23 uses NiFi across multiple infrastructure and orchestration platforms, including Kubernetes. Our pioneering NiFi engineering work allows us to programmatically provision data flows using a library of pre-existing, best-in-class, data flows that we have developed and honed after many years of operational use. Thanks to the recent development work on the NiFi-Fn project by Sam Hjelmfelt at Cloudera, there is now a direct path to running NiFi Flows directly on Kubernetes without the need and overhead of an administratively complex NiFi cluster.
The NiFi-Fn project brings to NiFi the ability to execute pre-existing data flows as serverless applications. What this means NiFi users is that flows can now be started on-demand and run to completion with success determined by the successful processing of all flow files sent as inputs to the flow. The B23 Kubernetes Operator for NiFi-Fn offloads the job management of specific flows to Kubernetes. An inspiration for our work was the recently open sourced Kubernetes Operator for Apache Spark released by Google.
At B23, we choose to run container-based workloads on Google Kubernetes Engine (“GKE”). For us, it was important to manage our NiFi flows as first-class citizens in Kubernetes by utilizing Custom Resource Definitions (“CRD’s”). This allows us to create and manage our data flows just like any other resource in Kubernetes. Using familiar commands like `kubectl create -f nififn-flow.yaml` or `kubectl get NiFiFn` we can create new flows and list running or completed data flows.
The operator will handle the creation of a Kubernetes Job resource and execute the desired flow after pulling it from the registry. This lets Kubernetes handle the semantics of retry and cleanup while giving the user control over the logic and execution of the flow.
The B23 Kubernetes Operator for NiFi-Fn is open source and Apache licensed. It has been tested locally with docker-for-desktop and in the cloud with Google Kubernetes Engine. We have many ideas about how to improve this new capability and we look forward to working closely with the NiFi community to develop it further. If you’d like to try out the operator for yourself, head over to the github repository to get started: https://github.com/b23llc/nifi-fn-operator
The JIRA ticket for the NiFi-Fn project: https://issues.apache.org/jira/browse/NIFI-5922