September 17th, 2018
Data engineering is hard and getting it exactly 100% right results in a single outcome – machine learning engineers and quants can now do their job effectively. The analysis of data, and subsequent execution of those insights, is the competitive differentiator and core competency of business – its heart and soul. Data engineering is the commoditized heavy-lifting every organization needs to perform to get the analysis correct. This is why we see data engineering as a zero-sum game. Getting data engineering right means organizations are just breaking even – it simply allows other employees to do their job properly. Getting it wrong means everything and everyone else dependent on data engineering cannot operate effectively. Outsourcing the commoditized heavy-lift data engineering is the least risky and most cost-efficient path to achieve the economic and market leading competitive advantages organizations need to compete.
Prioritize Algorithm Development Over Data Engineering
Modern organizations should prioritize and invest in the algorithm development, quantitative research, and machine learning aspects of data science. These activities can make or break firms who use data for a competitive advantage. Applying machine learning in a meaningful way using data formatted specifically for those algorithms is not a trivial task. To be successful, organizations should recognize the undifferentiated and differentiated activities associated with extracting insight from data, and decouple the activities required to get data into a specific format (or schema) to support those algorithms from the development and tuning of those algorithms.
Race Car Drivers and Data Mechanics
An interesting social phenomenon we’ve observed over the past several years is that we have yet to meet a data engineer that wasn’t secretly plotting a career change to become a machine learning engineering and/or quant, and with a more data science centric job title to-boot. If machine learning engineers and quants are the race-car drivers of the modern data-driven business – data engineers are the mechanics. Organizations desperately need data mechanics in order to service and maintain their shiny new machine learning algorithms. There is no denying quants, researchers, and machine learning engineers are often perceived in a higher corporate social structure with regards to salary, influence, and demand. As a result, finding and retaining competent and complacent data engineers who like what they do is often the most significant challenge for organizations.
Challenges and Opportunities with Data Engineering
Most organizations are not getting data engineering 100% right all of the time. Data engineering can be unnecessarily expensive for organizations who are not doing it right. This means that data science, quantitative research, and applied machine learning initiatives are being held back and stalling, and costs are escalating making a bad situation worse. There are several reasons why this is occurring.
- Do-It-Yourself (DiY) is Expensive and Not Timely
We’ve frequently observed organizations at first attempt to establish their own data engineering teams. It’s an obvious choice without the benefit of hindsight and experience. Often, this corporate initiative is generically branded as “data analytics” or “data science.” Organizations do not realize that data engineering is very different discipline than data analysis. Employee responsibilities are often blurry with respect to who is responsible for the data acquisition, wrangling, and engineering; and who is responsible for the analysis and algorithm development. We’ve frequently observed organizations that “designate” employees who may know a laptop-scale tool like python pandas as their lead data engineer, only to fail when they realize they really needed performant ETL code running on a cloud-based distributed platform like Apache Spark. Good luck making sure your data is secure in the cloud through of all it.
Hiring new employees with the proper skills takes too much time and is expensive – that’s assuming organizations can even find them in a very competitive job market. Even if these employees with the proper skills do exist within the organization, the good ones are usually not sitting idle at their desk. Getting an internal data engineering and analytics project off the ground with internal resources is usually never as quick as the pace of business dictates.
- Contracting Supplemental Resources is a Gamble
Once an organization realizes they are not capable of doing-it-themselves, they look externally for help. A common reaction is to supplement existing employees with external contractors or off-shore the work. This decision often just accelerates failure for an organization, and quickly leads to even more cost overruns. Contracting external resources is like rolling the dice – you are never sure who you are going to end up getting in terms of skills and capabilities. Contracting staff to supplement an internal team adds layers of contractual and communication challenges leading to more delays, costs, and corporate-social friction.
- The Myth of Human Mediated Knowledge Sharing
No contracting or consulting organization on the planet has a knowledgeable and experienced data engineering staff ready on a moment’s notice. Further, the concept of knowledge sharing, meaning contractors can share best-practice knowledge based on prior work experiences, is a myth. Since the beginning of time contracting firms and consultancies have been marketing their claim to share human mediated knowledge between different initiatives. We know that best practices with respect to technology implementation and data security are best codified through software, not people. Ultimately, contracting organizations just want to ship people to projects, collect their fees, and hope no one complains.
- Corporate Liability and Shared Risks
Shared business risk is often not considered when outsourcing data engineering and data science to contractors. B23 has observed a higher frequency of blatant technical mistakes and security issues as a result of activities performed by third party contractors, versus those customers that attempted to do-it-themselves. There is very little skin the game for contractors to get it right. Contactor mishaps put the sourcing business at great risk from exposure to data breaches and a variety of security, governance, and compliance issues. Their work is a series of “checking the boxes” in order to meet their contractual obligations. Their motivations are understandably often conflicted between the needs of the contracting and sourcing organizations.
Win-Win Using Managed Services
In our next post, we show how using a data engineering managed service from B23 solves the challenges described previously and accelerates business competitive advantage.