The Future of Data Infrastructure: Why the Modern Data Stack Hit Its Ceiling

Data Infrastructure

Leon Kozlowski

Nov 26, 2025

For the last decade, the Modern Data Stack (”MDS”) has been treated as gospel. Fivetran, Snowflake, DBT, Looker - the sacred lineup - billions poured into pipelines, dashboards, and contracts.

Beneath the hype, the MDS didn’t win because it was optimal. It emerged from an industry that had devolved into patching flaws with point solutions, each marketed as a new “best practice.” This isn’t an exhaustive history, but these are the components that mattered and the impact they had.

Today, the cracks are showing, and something new is quietly shifting the paradigm: ontologies.

How we got here: Row Stores → Column Stores

The first big shift wasn’t cloud. It wasn’t ETL, it wasn’t dbt - it was a storage layout.

Row-oriented databases (MySQL, Postgres) store data like this:

Perfect for applications but terrible for analytics.

Column stores flipped it:

Suddenly scanning 3 columns out of 100 became cheap, compression exploded, and this truly paved the way for the analytics boom.

When MySQL/Postgres/Oracle were the warehouse

Before Redshift and BigQuery, analysts just ran reports on production app databases - cron jobs would lock tables, aggregates would take down the app - effectively - “don’t run that during business hours”.

This taught us the first important lesson: Operational and analytical workloads must be isolated.

This realization fueled the MDS revolution; don’t get me wrong, data warehouses have been around for many decades, but the MDS didn’t explode until the early 2010s.

Cloud Data Warehouses: The Separation of Storage & Compute

BigQuery, Redshift, and Snowflake changed the game, not only because of column oriented storage, but also because they segmented storage and compute. S3 and GCS provided virtually infinite storage, the execution layer became elastic, and the “petabyte scale” moniker became ubiquitous… without sharding. The decoupling was lightning in a bottle. There are tradeoffs - data must be moved into the warehouse, everything must be modeled in the warehouse, and it becomes a single chokepoint. The creation of a “space” leads to verticals - get ready to spend.

The Vendorization of the Pipeline: “Business Intelligence”

It’s funny, BI was rebranded as Analytics Engineering, along the way someone realized BI doesn’t yield intelligence.

Ingestion

Fivetran and Stitch solved one problem - “give me SalesForce data” - but the business model is legal extortion. Charging per row, ingesting by brute force without semantics or intelligence.

Transformation

DBT - actually a revolution and truly useful. Software engineering applied to data modeling - something beautiful, you get versioning, Jinja macros, DAGs, testing. But you end up with thousands of staging models, hundreds of intermediate joins, tables that only exist to support other tables. Still no meaning, semantics, or intelligence. Your team of business intelligence engineers became plumbers who are constantly playing catchup. This becomes a never ending cycle of transformation logic.

Dashboarding

We learned about charts and plots in grade school - what a great abstraction, but a better contract. They sell you visualization, governance, RBAC, and semantic layers that don’t work. Again you are never sold actual understanding.

All of these point solutions have their own market in the MDS ecosystem - while decoupling of storage and compute was good - you’re still left with a massive centralization problem and dependency hell. Ingestion, modeling, metadata centralized and on a schedule (don’t sell me materialized views please and thank you). This doesn’t scale with distributed systems, it doesn’t scale with LLMs, and it doesn’t work with real time events. The world is too connected and complicated for a DAG-based single source of truth to keep up.

The Robin Hood of the Data Stack: Iceberg

Misunderstood is an understatement. Iceberg didn’t just fix table formats, it turned object storage into a fully versioned, branched, transactional system without the warehouse. It democratizes storage, breaks vendor lock-in, puts you, not the warehouse vendor, in charge of your metadata and enables distributed SQL to be the real compute layer. Metadata is the control plane, ACID, snapshots, schema evolution, reproducibility. You now have virtually infinite storage and horizontally scalable compute. Instead of moving all the data to the warehouse, just bring the query engine to the data. You can query across multiple storage systems, avoid vendor lock, low-cost and high parallelism.

The Missing Link: Ontology

Warehouses model tables, dbt models transformations, dashboards models visuals, none of them model meaning.

Ontologies capture entities, relationships, hierarchies, events, semantics, and most importantly provenance. Without this, dashboards can lie, this is why analysts disagree and governance never works. The Texas Sharpshooter Fallacy in action - we are missing meaning.

And more importantly this is why “AI for Analytics” tools fail, they’re powered by some half-baked RAG pipeline and embedding, which is not the same thing as true understanding. With an ontology, dbt workloads shrink, BI simplifies, context becomes queryable, and RAG becomes retrieval with reasoning and provenance rather than vector-search roulette.

Ontologies are what RAG wants to be.

The Post-Warehouse Era

The Modern Data Stack was a necessary stepping stone, as it solved yesterday’s pressing problems, but the future won’t be built on copying SaaS tables into Snowflake and dashboard building. Making data “AI ready” is the current objective for many startups, the path forward is less of a rip and replace and more of a consolidation. The work isn't building new infrastructure, it's wiring semantic context into what you already have.

You may ask, “If ontologies are the answer, why haven’t they won already?”

Because they’re hard. Building an ontology requires upfront thinking about what things mean and less about how to “join them”. The MDS deferred that work, just dump it into Snowflake model it and figure out semantics later (if at all). The winners won’t be the ones with the most tables in their warehouse, they’ll be the ones whose data can explain itself. The next stack won’t be a stack at all, it will be a graph with strong provenance and interoperability with respect to storage and retrieval.

The future is distributed, semantic, versioned, AI-native, and query-anywhere… in other words, ontology-driven. Data Warehouses are filing cabinets (simply storage), RAG is Google Search (whatever happens to be indexed), Ontology is the map (understanding).

And understanding will win.