Data Stack Deep Dive Season 1: Data Connectors

With Chloe, Elise, and Olivier from Idinvest we’re beginning a series of deep dives about the B2B Software data stack with a focus on a key component of this ecosystem: Data Connector startups.

Table of content:

Data Connectors: The piping system of B2B software
Horizontal Data Connectors
Vertical Data Connectors
The Data Warehouse is the new Platform
Data Connectors for Enterprises
Data Connectors for SMBs
The future of data connectors

The Videos

1. Data Connectors: The piping system of B2B software

Defining Data Connector startups

By data connectors, we mean the startups that enable businesses to connect together the various software, SaaS, databases that they use as well as external sources, and let data flow easily.

Our definition is broad on purpose because there are many flavors of this concept, but a major common aspect that distinguishes data connector startups from the majority of software which simply connects with each other through native integrations is that Data Connectors are neither the source of data, it’s not where data is collected, nor the final destination of data. They sit in the middle as data pipes.

The data stack theory of evolution

What’s interesting to notice with data connector startups is that they tend to emerge once the software stack of a category or an industry reaches a certain critical mass and managing data flow between all of them becomes a pain for businesses.

Zapier, one of the OGs of the space, emerged once the explosion of horizontal SaaS was underway.

It’s the same with Segment which made a ton of sense once the Martech landscape started to become too complex.

And more recently we saw several data connectors emerging in the Fintech space where a Cambrian explosion of apps is happening.

It’s almost like the theory of evolution, every time a software stack becomes too crowded, a data connector startup will organically emerge.

Are data connectors defensible businesses?

A central point of our investment thesis around data connectors is that being a pure connector is not enough to build a defensible business. It’s a great starting point, but in the long term, only enabling the flow of data is not defensible because piping becomes a commodity and it has less value than a System of Record.

Coming back to our theory of evolution, to become defensible, once data connectors emerge in a space, they need to evolve quickly and add more features to lock users in.

2. Horizontal Data Connectors

By horizontal Data Connectors, we mean the products which are not specific to a particular industry or type of data. They connect plenty of different applications and data sources regardless of where they come from. It includes companies such as Zapier which is mainly for small businesses.

But also more “enterprise” types of products such as ETL software with Talend, Stitch, Fivetran, Panoply, and others.

Value propositions of Horizontal Data Connectors

The core value of these connectors is to collect and aggregate data that you have stored in various places. On top of that, they also provide additional sets of features that are key for their defensibility.

Automation and workflow. The first important category of additional features is the “Automation & workflow” one, which enables users not only to connect apps between each other but also to create their own personalized workflow with triggers. For example When someone subscribed to my Mailchimp newsletter on my blog, automatically add the lead contact to Salesforce and send him a thank you tweet. It’s what Zapier focuses on: they connect hundreds of apps and users can easily create their own data workflow.

Another common set of value-added features is around data orchestration. For instance, Stitch lets businesses make sure that their data pipeline works properly from scheduling to error monitoring and handling.

The third category of features is linked to data hosting and backup. In that case, the value of an ETL is not only to collect the data that you have stored in all your external and internal applications but also to normalize them and to store them in a single place where you know that they are safe.

The last set of features we’ll cover is around data quality and integrity. In that case, the aim is to provide a kind of “Quality Assurance” for your data and make sure that the data that flows is clean and reliable.

3. Vertical Data Connectors

In parallel with this well-established category of Horizontal Data Connectors, there’s a growing trend of Vertical Data Connectors that are specialized in specific industries or specific types of data.

Specialized/vertical data connectors

The rise of vertical data connectors is driven by the fact that software is eating more and more industries, leading to an increasingly fragmented data stack in every business category

It’s why we see more and more specialized data connectors that specialize in a specific type of data like Segment or mParticle for customer data.

Or on specific industries with for example TrueLayer in the Fintech space or Impala and Duffel in the travel tech space.

Value propositions of Vertical Data Connectors

Most of the time the value propositions of vertical data connectors are slightly different than the horizontal ones, and they tend to be used for different reasons. Most of the time they are not competing head to head.

The first major differentiation is that they provide integrations to specialized apps. Horizontal Data Connectors generally offer integrations with the most common software out there. On the other hand, vertical Data Connectors can provide integration to more specific apps. For example, if you’re in the Fintech space and need to build integrations to financial services it’s a better move to use data connectors such as Plaid or Tink.

The second differentiation is around integration depth. Building and maintaining a catalog of integrations is already a challenging tech problem, but another dimension to this complexity is integration depth. Horizontal Data Connectors often provide basic integrations that enable data to flow in and out of apps, but as soon as you need to do more complex operations you often need to turn to vertical data connectors. An example is Duffel which is a data connector in the travel tech space that provides more integration to airline booking software.

The last major differentiation is that, obviously, vertical Data Connectors can provide additional features that are more specific to the data type or industry they focus on. For example, Segment not only enables you to collect and centralize your customer data but also provides a range of features specific to customer data such as GDPR compliance or a consolidated overview of your customers’ interactions. Features that you won’t find on most horizontal data connectors.

4. The DataWarehouse is the new platform

Now that we have described in broad strokes the data connector landscape, let’s cover a couple of interesting trends happening.

The data warehouse is the new platform

In the enterprise space, a key function of Data Connectors is to collect and store data in what is called a data warehouse.

It’s the main role of ETL software that lets the data team connect to a myriad of applications to extract, normalize and load data into this central database/repository.

What’s interesting to notice is that in the past couple of years more and more startups are building products on top of DWs. For example, while most ETLs are good at collecting and centralizing data, Census focuses on the “data outbound” aspect to make sure that the third-party apps you use have the latest data synchronized with what is inside your data warehouse.

Iteratively is another example as they focus on a step before in the data funnel to make sure that the data which is collected from your third party apps is reliable before it’s sent to your DW.

These are two examples but an increasing number of startups are building products on top of the data warehouse.

Operationalizing the data warehouse

This growing trend of products build on top of the data warehouse seems to be driven by first the increasing importance of data in every department. Whether it’s at the product, sales, and marketing, or customer support level, more and more decisions are fueled by data and you can find more and more specialized analytics products available.

But this trend is also driven by the increasing number of data engineers in these departments. If before the data team was mainly centralized and dealing with analytics at large, you can now find data engineers in marketing or product teams, and these employees are craving for these new data connectors’ products.

So the market is getting deeper not only in terms of use cases but also in terms of users.

5. Data Connectors for Enterprises

As we’ve seen above, the core promise of data connectors is based on the same problem felt by many businesses: They use more and more software and they have more and more data flying around, so Data Connectors help them collect, centralize and move around data in this fragmented stack

That being said if this problem is common to all, it is solved quite differently for the enterprise or the SMBs segment.

Data Connectors in the enterprise space

In the enterprise space, what’s interesting to notice is that many of the current trends happening started with the data analysis/data engineer/data science teams.

These data teams in enterprise businesses need to collect and centralize as much data as possible to run their analytics software. For them, the goal of setting up a data warehouse and ETL software is first and foremost to provide insights into how the business is performing whether it’s financial, customer support, or marketing performance.

This is why data warehouse and ETL software are first and foremost tools built for data teams and why they are not used “operationally” by the marketing, product, and other departments.

But now that this infrastructure is in place, these other departments want to leverage it and use it for their daily operations, not only for business intelligence and analytics reasons, hence the rise of products like reverse ETL and the data warehouse becoming a platform, two trends I covered a previous video.

Consequence:

Two major consequences of that are

First, Setting up a data warehouse requires a proper data team and costs quite some money. this is why the products in that space are mostly built for the enterprise segment or for data-driven companies. SMBs simply don’t have the resources to do it. For instance, when you dig into snowflakes Illustrate S1 filling the average revenue per customer is in the one hundred thousand dollars range.
Second, it’s also why the “operationalization” of the data warehouse by the other departments is likely to accelerate. It seems to be the next logical step in this tech adoption cycle, market timing looks right here.

6. Data Connectors for SMBs

Setting up a dedicated data warehouse and the stack of tools on top of it is costly and requires a dedicated team of data engineers, both of which prevent most small and medium businesses from adopting such technology.

This is why for businesses in this segment, their data warehouse is their Intercom, their SalesForce, or Zendesk when it comes to customer data. Or it’s their Mixpanel for product related data or their Google Analytics for traffic data.

The bottom line is, instead of having a dedicated place where they centralize data, they will use their existing systems of record as their “data warehouse”. And they will use data connectors to enrich data directly in these SORs. For example, you can connect your Intercom account to your Hubspot account in order to enrich your Intercom customer data h with the data you have in Hubspot.

Consequences

The two main consequences I want to highlight are

First, if fundamentally the data fragmentation problem is the same for both enterprise and small businesses, the way it’s currently solved is very different. For the enterprise segment the solution comes from data engineers who build and maintain a central repository for their business data, whereas for small to medium businesses the paradigm is the opposite, it’s not about centralizing data but about spreading it in their various SORs. Kind of a centralized approach versus a decentralized approach.

Second, since most small to medium businesses do not have a full data engineer team, the tools made for this segment must be easier to set up and to use. This is why it’s much more based on self-serve plugins and connectors that you install in a few clicks and that synchronize data between apps. Obviously, it’s easier to use, but it’s also far less customizable.

7. The future of data connectors

Now that we covered a bit more the difference between data connectors in the enterprise and in the SMB segments, what I want to do now is to try to see where this space could be heading in the short term and ask a couple of questions.

Enterprise segment

It’s probably in this segment that the short-term trends are clearer as they are driven by the operationalization of the data warehouse. Now that the data team has set up a data warehouse the other departments want a piece of it. In that perspective, I have two questions that I’m thinking about.

First, what are the areas and functions at the data warehouse level that will be “operationalized”? The first immediate category of software emerging in that perspective is the reverse ETL one, but what are the following interesting approaches that will emerge? If any.

My second question is around the potential tensions that could appear between the data analyst team which setups the data warehouse and the other departments whether it’s marketing, product or customer support who wants now to benefit from it. At the end of the day, who is going to own the data warehouse and be its gatekeeper?

SMB segment

On the small to medium size businesses side, I believe that we might be slightly earlier in a new tech adoption cycle. In that perspective, I have two potential scenarios in mind.

The first scenario is democratization of data warehouse and ETL tech for SMBs. Maybe a new breed of products similar to enterprise ETLs, but built specifically for SMBs, will emerge. They’ll be easier to set up and use and still based on a centralized model where all data is aggregated in one place.

The second scenario is the decentralized one. Maybe the combo of a single data warehouse with a stack of software on top of it is not adapted to the SMB needs and we’ll continue to see what we see today with data connectors which value is to synchronize data between the various systems of records of a company.

In any case, if you have any thoughts about these different questions and scenarios don’t hesitate to share them with me, super happy to take inputs to form my opinion on these topics.