Innovate on Your Data – Maxwell Meets Streamsets

By Mark Bittmann, Partner & Lead Data Scientist at B23 LLC, a Big Data and Cloud Computing Professional Services Implementation Company bringing new age innovation to Big Data In a recent blog, Martin Kleppmann laid out the reasons for and advantages of an immutable, event-based streaming data architecture — you can find it here: ‘Stream processing, Event sourcing, Reactive, CEP… and making sense of it all’. A few of our customers are currently migrating to this model. ACID transactions are still very important to a modern data platform, and many applications are still writing to and reading from a mutable database as a single source of truth for the enterprise. A change to a database table means: “something happened,” and when something happens, other applications might want to know about it. Those consumer applications include data warehouses, operational analytics platforms, or machine learning models. We recently had a need to pull results from MySQL binary logs and push them to Apache Kafka. Several tools exist for parsing the MySQL binary log files, and several of them pipe directly to Kafka. These tools manage binlog offsets and table schemas externally, and are much more operationally friendly than tailing the binlog files directly. Our search centered on Maxwell and mypipe, but we found that Maxwell was really easy to use, and Zendesk has it running in production. There is one for PostgreSQL called Bottled Water. You just point these tools at your (binlog enabled) database, and you can have a streaming event log on a Kafka topic. Once on Kafka, you have tremendous opportunity for a complex data workflow, such as real time processing with Storm, Flink, or Spark Streaming or writing to destination endpoints such as Elasticsearch, HDFS, Cassandra, or S3. There has been major progress recently on data workflow tools, including Apache NiFi, StreamSets, and Kafka Connect (all open source!). We chose StreamSets for our data pipelines. All of the binlog replicator tools — Bottled Water, mypipe, Maxwell — are daemon utilities....