Written by Dan Cahana, Investor at GGV Capital
Over the last decade, as technology has subsumed greater swaths of the economy and the competitive frontier has moved from the physical to the digital world, we’ve seen the best companies win with data. From their earliest days, companies such as DoorDash, OpenDoor, and Stitch Fix organized their business models around a unique ability to collect and analyze massive volumes of data from the very beginning, and incumbents such as Nike and Capital One have built hundreds-strong data teams to drive their shifts to a digital-first strategy. Meanwhile, digital giants such as Google, Facebook, and Amazon have become behemoths due to their ability to parse and instantaneously act upon unimaginably large sets of internet, consumer, purchase, and demographic data.
While companies now recognize the immense value of data, few have the knowledge or resources to build data teams and infrastructure to rival the DoorDashes or the Nikes’ of the world. With over 79 zettabytes (one zettabyte is a trillion gigabytes) of data currently circulating worldwide, and 181 zettabytes expected by 2025, organizations have an urgent need to capture, analyze, and understand the data that will propel their businesses forward.
Unfortunately, even if a company has the capital to hire data scientists and engineers, finding them is exceedingly difficult. There is a massive shortage of such talent, which will be further exacerbated as job growth outpaces new entrants to the workforce. To bridge the gap between their data-crunching needs and the lack of data scientists, companies urgently need easy-to-use tools to help them better capture, analyze, and act upon massive datasets. We’re seeing a new wave of startups emerge that offer data infrastructure applications and low-code or no-code tools to help companies derive value from their data without having to hire large data teams.
Several of the leading data startups have roots in Israel, where founders are leveraging their deep expertise in data, security, AI, and machine learning to build flexible next-generation data tools.
The Data Landscape
Public data infrastructure companies such as Snowflake, valued at more than $77 billion, and private ones such as Databricks, valued at more than $28 billion, have solved a large part of today’s data conundrum. They have allowed companies, previously constrained by the size and scope of their on-prem data warehouses, to store truly vast amounts of data in the cloud, and for technical teams to access it for computation and analytics. But these incredibly successful and impressive companies don’t offer everything to everyone—no one company can. By enabling so many companies to store so much data, Snowflake and Databricks have in fact created opportunities for startups to help solve the two next big challenges: improving and simplifying data engineering—which encompasses discoverability, governance, observability, compliance, and modeling—and broadening data consumption, so that more departments and people within companies can make use of their investment in data. The success of Snowflake and Databricks has created a fertile ecosystem for startups to tackle specific data engineering and consumption challenges, such as helping companies monitor data to ensure it’s consistent, make it discoverable for data scientists and business users, or model it so it’s usable for downstream consumers.
Most data engineering tasks were done manually until now, which isn’t scalable and runs up once again against the skills shortage; companies need to hire data scientists and engineers who can build in-house tools. And even when companies have large data teams, they often don’t have enough bandwidth to build complex platforms. Many data engineering teams spend far too much time serving as “help desks”; when downstream data consumers find an issue with data reliability, or can’t find the dataset they need, they reach out to data engineering.
Beyond data engineering, companies also lack tools to power data consumption in a way that delivers their ultimate goal - driving business value. In most companies only a small subset of employees are able to create dashboards, let alone build (or even understand) the machine learning models that have the biggest impact on business growth. This data bottleneck has moved from storage and computation, which Snowflake and Databricks solved, to engineering and consumption, which startups are stepping in to solve.
A Greenfield for Data Startups
We believe there are two major categories of opportunity for emerging data startups: unbundling data engineering and enabling better (and broader) downstream data consumption.
Data engineers are stretched thin, creating opportunities for startups to provide tools that unbundle and productize some of their routine tasks, freeing them up to focus on higher-value work. One startup tackling this space is Monte Carlo. Founded by Israeli technologists Barr Moses and Lior Gavish in 2019, Monte Carlo tackles data observability, monitoring companies’ data to ensure there is no “data downtime,” reducing engineers’ time spent on this task by 30% or more. Two other startups offering point solutions for data engineering include Fivetran and BigID, which offer off-the-shelf data pipelines and tools for data governance and privacy, respectively.
There are also opportunities for startups on the data consumption side to provide business users with no-code or low-code tools to analyze data more easily, and easy-to-use tools for data scientists and analysts to share data more easily across the organization. Today, most data consumption is still done on an ad-hoc basis by data analysts and data scientists with specialized skill sets, and findings remain hard to interpret.
This means companies are only leveraging a fraction of their data in meaningful ways. One startup delving into the data consumption tools space is Pecan, an Israeli company that provides an easy-to-use platform to data analysts to help them build predictive models without machine learning expertise. Sisense and Metabase are also active in this area, offering embedded analytics for end-customers and visualizations that non-technical users can double-click into without needing to know the database language SQL.
Israel in particular is fertile ground for innovation with serial entrepreneurs building their next set of companies, data experts with deep technical expertise honed during military training, and a whole new class of founders emerging from data-driven companies such as Google, Airbnb, Facebook. Israel is truly ground-zero for data innovation.
Companies find themselves at a unique moment in time; they have committed to becoming data driven, have invested in data infrastructure and talent, and yet are beginning to see the shortcomings of their data stack. Startups that help non-technical users better understand and extract value from data, and equip engineering teams with the tools they need to manage data observability, compliance, security, and more, will come out on top.