Not All Kubernetes Services Are Equal. We Should Know.

Kubernetes promises the long sought-after capability to fully abstract underlying public cloud, private cloud, and edge infrastructure from the perspective of software applications that perform specific functions, or workloads. For B23, the value of Kubernetes means that all of the innovative and ground-breaking data engineering and applied machine learning workloads that we have developed and operated based on years of experience can be seamlessly deployed in almost any environment that runs Kubernetes. B23 supports and operates a variety of Kubernetes solutions including “pure” Kubernetes that we deploy to any arbitrary set of supported server hosts. We also support public cloud managed Kubernetes services from Google, Amazon, Microsoft, and DigitalOcean. We support integration to a previously running Kubernetes system to address private cloud Kubernetes solutions. Most recently, we support Rancher’s K3S for edge computing solutions (more on that exciting news in a later blog). We’ve done Kubernetes the “hard way” from scratch, and we’ve done it the “easy way” using cloud managed Kubernetes, or at least we thought managed Kubernetes would be easy. In some cases, the “easy way” was just not so easy. That’s why the “conceptual value” of Kubernetes varies from the “actual value” of Kubernetes. It depends heavily on your cloud service provider. Here are some of the high-level differences we have found in our pursuit to achieve our ultimate goal of infrastructure agnostic workloads using Kubernetes. They fall into the following categories: Default security features and versions vary by Kubernetes service provider Non-existing or limited built-in support for Kubernetes auto-scaling capabilities across service providers Some service providers require proprietary or provider-specific functionality leading to vendor lock-in The workflow and lifecycle management of Kubernetes and hosted workloads vary in capability and complexity The SDK ecosystem for programmatically operating managed Kubernetes solutions vary greatly in maturity The...

Exploring Credit Default Swap (CDS) Market Data Using Modern Data Science Techniques

October 2nd, 2018 Title VII of Dodd-Frank Wall Street Reform and Consumer Protection Act addresses the gap in U.S. financial regulation of OTC swaps by providing a comprehensive framework for the regulation of the OTC swaps markets.  The objective of this blog is to describe how to rapidly and securely analyze credit default swap (“CDS”) transaction data using cloud computing and advanced machine learning (“ML”) techniques.  We obtained the CDS data from the Depository Trust and Clearing Corporation (“DTCC”).   A fundamental technology enabler for our customers is the B23 Data Platform which is a Cloud-based artificial intelligence (“AI”) engine to discover, transform, and synthesize data from a variety of sources to provide unique and predictive insights.   The B23 Data Platform is used by data-centric enterprises in many different industries including technology companies, government agencies, and financial institutions to securely use the Amazon Cloud to gain insight from very large data sets. Scope and Accomplishments of our efforts include: Created secure Machine Learning (“ML”) Analysis Cluster in representative Customer Private Cloud Ingested 5 years of CDS data in into the ML analytics cluster in 1 minute Identified anomalous CDS trading activities for complex market transactions Created established CDS compliance reporting metrics Identified CDS market characteristics at individual products and individual series levels     Identify Anomalous Transaction Activity or Faulty Reporting in Markets B23 investigated several transactions in the DTCC CDS data that looked peculiar.  Figure-2 shows a relatively small number of transactions composed of multiple line items in the data set that exhibited anomalous market activities. In the example above, “yellow” nodes are correction activities, and “red” nodes are cancellation activities. Walking through each set of transactions, the following activities are occurring: A single contract is created for a value of $52M (blue node labeled “new”) The $52M original CDS is corrected to a...

Announcing Jupyter Notebook on the B23 Data Platform

March 7th, 2018   B23 is happy to announce we’ve added Jupyter Notebook as the latest stack in our platform. Jupyter has quickly become a favorite with Data Scientists because of its notebook format and support of many programming languages like R, Scala and Python. The B23 Data Platform gives Data Scientists access to their preferred data processing tools in a secure, automated environment targeted specifically to their business needs. According to a recent Harvard Business Review survey, 80% of organizations believe the inability for teams to work together on common data slows the organization’s ability to quickly reach business objectives. B23 Data Platform can help these organizations boost their data science team’s productivity with notebook collaboration and sharing tools like Jupyter. Thanks to easier data access and computing power paired with rich web user interfaces, open source capabilities and scalable data cloud-processing solutions, Jupyter Notebook adds another favored power tool to B23 Data Platform. With just a few button clicks, the Jupyter stack launches.     Open the Jupyter Notebook URL and you are ready start coding!     The B23 Data Platform is an open, secure, and fast marketplace for big data, data science, artificial intelligence and machine learning tools. In minutes, Data Scientists can securely analyze their data in the cloud — with the freedom to use familiar tools like Apache Spark, Apache Zeppelin, R Studio, H2O and Jupyter Notebook. Discover a better way to analyze your data with B23.      ...

Experimenting with Chromebook Data Science in the Cloud

September 26th, 2017   Last spring I gave a talk at New York R Conference and EARL SF titled “The Missing Manual for Running R on Amazon Cloud”. It was meant to be targeted at small (or large) enterprise users looking to build out or outfit a data science team capable of doing effective data science in the cloud with all the data ingest, security and usability concerns and implications that come with navigating that space. In recent months I’ve been surprised but overjoyed to see cloud data science start to become championed by citizen data nerds and #rstats community folks in academia. Jeff Leek has been reporting on his second attempt at a Chromebook data science experiment. Brian Caffo has several videos on his YouTube channel about “going cloudy”, including a review of Jeff Leek’s blog post from above. In an experiment of unknown and unplanned duration, I’ve been leaving my work laptop at work on Friday night, then resisting the urge to go get it on Saturday morning. If I need or want to do anything on the computer, be it R or otherwise, I have to figure out how to do it on my Acer Chromebook 11.       The major things, like working with RStudio server on AWS, aren’t all that different from how I operate every day at work. I do find that I’m more likely to “cheat” and use a local-cloud hybrid approach to data management when I’m using my work machine, and I like that the Chromebook forces me to honestly evaluate the usability of the cloud data science system we’ve designed. It’s the little things that have me feeling constrained on the Chromebook. Taking screenshots, managing them all, editing diagrams and trying to create slide deck presentations is all a bit of a drag. So far I’ve felt more effective switching to my phone when I need to do that sort of thing. Making an Acer Chromebook 11 feel satisfying to operate is probably an entirely lost cause, but there is something really fun about having all the power of the cloud at your fingertips on one of the cheapest little laptops money can buy.    ...

Announcing File-Level Integration with GitHub

February 23rd, 2017   Packing up and moving your big data from on-premises can be daunting from workload capacity and timing pressures to worry over protecting your proprietary projects. B23 Data Platform has already automated the process of launching software inside virtual private clouds. Now B23’s latest feature offers you the convenience and security of seamless GitHub integration. Want to avoid using a config file or command line? Not excited about loading and installing tons of software to sync your project files? B23 Data Platform now automatically clones your Git repository upon your stack launch. B23 has brought GitHub integration to you and the other 70,000+ organizations that trust the GitHub software building community.     Got a few minutes? B23 lets you select individual files or entire stacks of data to move. Just sit back and relax while you watch your projects securely populate the R environment where your data will reside on one or more nodes without ever touching the Internet, unless you so desire. With OAuth, there’s also no need to provide any sensitive credentials to B23 for authorization. Walking through our GitHub integration, you’ll be presented the option to link your GitHub account with B23 Data Platform.     Then you select one of the GitHub repositories under your account or enter a customized repository name.     Finally, you can select individual files to move or copy the entire repository to your chosen destination.     Now we’ve removed another step between you and your data! GitHub integration is currently available for the R and SparklyR stacks. Be sure to check out this feature as we continue to provision more opportunities for automated deployment of development environments. Please reach out to us at info@b23.io for more ways we can help drive your data solutions....