Context Graphs: What the Ontology Debate Gets Wrong

The context graph conversation has exploded over the past two weeks. Foundation Capital's "Context Graphs: AI's Trillion-Dollar Opportunity" kicked it off. Animesh Koratana's thread on coordinate systems deepened the technical framing. Daniel Davis published "The Context Graph Manifesto" connecting it to the semantic web's heritage. I've written two responses myself—on the operational context layer and building the event clock.

The conversation is valuable. The questions being asked—how do we give agents organizational memory? how do we capture decision traces? how do we build systems that learn from how work actually happens?—are the right questions.

But I've been watching a framing emerge that I think misses the point. And since I've been building context infrastructure for over three years, I want to push back.

My argument is simple: entity ontologies are largely solved by existing standards. The real unsolved work is temporal validity, decision traces, and fact resolution. Learning helps there—not at the entity layer. The dichotomy isn't prescribed vs learned. It's adopt foundations, then learn what's novel.

The Debate as It's Being Framed

The current conversation has crystallized around a dichotomy: prescribed ontologies vs learned ontologies.

On one side: Palantir. The argument goes that Palantir built a $400B+ company on prescribed ontologies—define the schema upfront, map messy enterprise data into it, deploy forward engineers to customize it for each customer. It works, but it's expensive, slow, and doesn't scale.

On the other side: the future. Learned ontologies. Structure that emerges from how work actually happens rather than how someone designed it to happen. Agent trajectories as training data. Decision traces that compile into organizational world models. The ontology isn't declared—it's discovered.

Jaya Gupta put it directly in a recent post:

"The next $50B company will be built on learned ontologies. Structure that emerges from how work actually happens, not how you designed it to happen... Memory assumes you know what to store and how to retrieve it. But the most valuable context is structure you didn't know existed until agents discovered it through use."

Animesh Koratana extended this with a technical framing—five coordinate systems (timeline, events, semantics, attribution, outcomes) that don't share keys. The argument is that agent trajectories are the training signal for learning embeddings that encode "join-compatibility" across these coordinate systems. The ontology emerges from the walks.

This framing is intellectually elegant. It positions learned ontologies as the democratizing future against Palantir's expensive bespoke past.

There's just one problem: it ignores twenty years of ontology work that already exists.

The False Dichotomy

The prescribed vs learned framing presents two options:

Palantir's way: Spend months or years with forward deployed engineers building custom ontologies for each enterprise. Expensive, slow, but it works for organizations that can afford it.
The learned way: Let structure emerge from agent trajectories. Don't prescribe—discover. The ontology compiles itself from how work actually happens.

What's missing from this framing is a third option that's been hiding in plain sight:

Adopt what exists. Extend where needed. Focus learning on what's genuinely novel.

The conversation is proceeding as if ontology is either an expensive custom artifact (Palantir) or an emergent property of sufficient agent runs (the learned vision). But there's twenty years of ontology work that neither camp is acknowledging.

The Ontologies That Already Exist

Before we talk about learning ontology, it's worth acknowledging how much of this work already exists in production systems. Let me introduce some prior art that seems to be missing from this conversation.

Schema.org

Schema.org is a collaborative vocabulary for structured data on the web. It was founded in 2011 by Google, Microsoft, Yahoo, and Yandex. It defines canonical types for the entities that matter: Person, Organization, Place, Event, Product, CreativeWork, and hundreds of others.

Schema.org isn't theoretical. It's in production across billions of web pages. When you see rich snippets in Google search results—event dates, product prices, recipe ratings—that's Schema.org markup being parsed and displayed.

The types are well-defined. A Person has a name, a jobTitle, a worksFor relationship to an Organization. An Organization has a name, a location, employees. An Event has a startDate, endDate, location, organizer.

This isn't a custom ontology someone has to build. It exists. It's maintained. It's designed for exactly the kind of entity modeling that context graphs require.

WAND and the Microsoft Common Data Model

Here's something that surprised even me when I rediscovered it: Microsoft's Common Data Model—the schema that powers Dynamics 365, Power Platform, and much of Microsoft's enterprise stack—was licensed from WAND.

WAND has been building enterprise taxonomies and ontologies for decades. Their work predates the current AI wave by years. When Microsoft needed a canonical data model for enterprise entities—Account, Contact, Lead, Opportunity, Case, Product, Campaign—they didn't build it from scratch. They licensed it from people who had already done the work.

CDM defines the entities that enterprise software operates on. Not in theory—in production, across thousands of organizations running Microsoft's business applications.

This isn't widely known—but it's significant. The enterprise ontology problem has been worked on for a long time, by people who deeply understand how businesses model their operations. Microsoft didn't reinvent the wheel. They licensed it.

Industry-Specific Standards

Beyond Schema.org and CDM, there are industry-specific ontologies that have been refined over years of real-world use. Healthcare has FHIR for clinical data exchange and SNOMED for medical terminology. Finance has FIBO for financial instruments and business entities. Manufacturing, logistics, energy—each has domain-specific standards.

These aren't academic exercises. They're production infrastructure. FHIR, for instance, defines how to represent patients, conditions, medications, observations, and clinical workflows. When we work with biopharma companies building knowledge graphs from drug discovery data, we're not inventing entity types from scratch. We're adopting FHIR primitives, extending where the domain requires it, and focusing our effort on what's genuinely novel—like tracking how observations about a compound evolve through trial phases, or connecting findings across studies.

The point isn't that any single ontology solves every problem. The point is that the work of defining "what is a Patient, what is a Medication, what is an Account" has been done—repeatedly, carefully, over decades. The foundations exist. The question is whether you use them.

What Palantir Actually Does

Daniel Davis made an important observation in response to Ashu Garg's framing:

"I don't know why everyone thinks Palantir has ontology tech. They do not. They take months, if not years, building it with 'forward deployed engineers'. This is not the way to achieve context graphs at scale."

This is worth unpacking. Palantir's expense isn't because "prescribed ontologies are hard." It's because they build custom for every deployment. They're not leveraging existing standards—they're creating bespoke models for each customer's specific data landscape.

That's a business model choice, not an inherent property of prescribed ontologies. You can adopt Schema.org types, extend with CDM patterns, and add domain-specific entities without spending years on custom modeling.

The Palantir approach is expensive because it's bespoke consulting dressed up as platform. The prescribed ontology itself isn't the bottleneck. The bottleneck is refusing to leverage what already exists.

Where Learned Ontologies Actually Help

I don't want to dismiss the learned ontology vision entirely. There's something real there. But I think it's being applied to the wrong layer of the problem.

The entities themselves—Person, Organization, Account, Contact, Event—don't need to be learned. We know what these are. The types are stable. The core relationships are well-understood. Waiting for agent trajectories to "discover" that Account and Contact are important entities is like waiting for a search engine to discover that nouns exist.

The CRM Lesson

There's an interesting parallel here that the context graph conversation hasn't acknowledged: CRM customization is essentially ontology modeling.

When companies set up modern CRMs like Attio, HubSpot, or Salesforce, much of the work is defining custom objects and relationships. A VC firm sets up objects for Funds, Investments, Portfolio Companies, LPs—and defines how they relate. A recruiting firm models Candidates, Positions, Clients, Placements. A real estate company tracks Properties, Listings, Buyers, Agents.

This is the ontology question in practical form. Every CRM implementation is a bet on "what are the entities that matter to this business, and how do they connect?"

We've worked with VCs setting up their data models to track investments across funds, and the exercise is indistinguishable from ontology design. They're defining entity types, attributes, and relationships—they just don't call it that.

A wave of "AI-native CRM" startups are now trying to shortcut this work—Lightfield, Clarify, Forge AI, Day.ai. The pitch: don't configure your schema upfront, let the system figure it out. It's the learned ontology thesis applied to CRM. (See the appendix for a deeper breakdown of these approaches.)

But even the most aggressive "self-building CRM" pitches still assume you need to know that a "deal" has stages, that "contacts" belong to "accounts," that "activities" connect people to opportunities. The core objects are stable across businesses—that's why Salesforce's standard objects look roughly the same as HubSpot's, which look roughly the same as Attio's.

Where inference genuinely helps: discovering which custom fields actually matter, which relationships get used, which objects are central vs peripheral to how this specific team works. That's organizational intelligence on top of a foundation—not the foundation itself.

Specifically, the organizational-specific layer:

Which contacts at an account actually matter for this deal?
What's the real decision-making process (vs. the org chart)?
Which exceptions get approved and which don't?
What precedents actually govern how decisions are made?

This is tacit knowledge. It's not in any ontology because it's specific to how this organization operates. It lives in people's heads, in Slack threads, in the pattern of which escalations get resolved how.

This is where learning from trajectories makes sense. Not "discover that Account is an entity" but "discover that Sarah Chen is the actual decision-maker even though the CRM says it's her boss."

The learned ontology vision conflates two very different problems:

Entity modeling: What types of things exist and how do they relate? (Solved—or at least solvable with existing standards)
Organizational intelligence: How does this specific organization actually work? What patterns govern decisions? (Genuinely novel, genuinely requires learning)

The first is infrastructure. The second is insight. They're not the same thing, and they don't require the same approach.

The Bootstrap Problem

Here's where the learned ontology vision runs into practical trouble: you can't learn from agent trajectories until agents are running effectively. But agents can't run effectively without foundational context.

I made this point in my previous piece:

"You can't wait for thousands of agentic RAG runs to 'discover' that Sarah Chen is a person who works at Acme Corp. You need to know that before agents start reasoning. Otherwise every agent trajectory is fighting the identity resolution problem anew—and you're paying for that confusion in tokens, latency, and errors."

The node2vec analogy that Animesh uses actually illustrates this. Node2vec learns embeddings from walk patterns over existing edges. It doesn't discover nodes and edges from scratch. The graph must exist first.

Agentic workflows operate the same way. The agent harness orchestrates tool calls over a pre-built map. The map is operational context: resolved entities, established relationships, temporal state. Build the map first, then agents walk it effectively.

So how do you bootstrap? The learned ontology vision doesn't have a clean answer. "Run agents and let structure emerge" assumes you have something to run agents over.

The practical answer: adopt existing ontologies as the foundation, then let learning refine and extend.

Start with Schema.org types for entities. Use CDM patterns for enterprise objects. Extract entities from your content using these established types. Resolve identities—Sarah Chen in email is the same as @sarah in Slack. Build the graph.

Then run agents. Then learn from trajectories. The trajectories teach you what the ontology couldn't: which relationships matter most, which patterns recur, which exceptions become precedent.

This isn't prescribed vs learned. It's adopt, then learn. Foundations first, intelligence on top.

What's Actually Unsolved

If entity modeling is largely solved by existing standards, what's actually novel? What's the hard problem that context graphs need to address?

It's not the nouns. It's the temporal and decision layer.

Temporal Validity

Most systems can tell you what's true now. Almost none can tell you what was true at a specific point in the past, or how truth evolved over time.

"Paula works at Microsoft" is a fact. But it's not a complete fact. When did she start? Does she still work there? What was true in 2022?

This is what Animesh Koratana called the "event clock"—the infrastructure for tracking not just current state, but how state changes over time. Facts with validAt and invalidAt timestamps. The ability to query "What did we believe about this account when we made that decision?"

Existing ontologies don't solve this. Schema.org defines Person and Organization, but it doesn't have native primitives for "this relationship was true from date X to date Y." CDM models Account and Contact, but doesn't track how those entities' attributes evolved.

Temporal validity is genuinely novel infrastructure that needs to be built.

Decision Traces

Foundation Capital's original piece got this right: the reasoning that connects data to action has never been captured as data.

When a renewal agent proposes a 20% discount despite a 10% policy cap, it pulls context from multiple systems. Finance approves. The CRM records "20% discount." Everything that made that decision legible—the inputs, the policy evaluation, the exception route, the approval chain—disappears.

Capturing decision traces isn't an ontology problem in the traditional sense. It's an instrumentation problem. You need to be in the execution path where decisions are committed, capturing not just what happened but why it was allowed to happen.

This is genuinely new territory. Existing ontologies don't have primitives for "this decision was made under policy v3.2 with an exception approved by VP Sarah Chen based on precedent from the Acme deal in Q2."

Fact Resolution

When you extract facts from content, you get assertions—not necessarily truth. Content can be wrong, outdated, or contradictory.

"The customer is renewing" might appear in a March email. "The customer churned" might appear in a September Slack thread. Both are facts extracted from content. But they can't both be currently true.

Fact resolution—determining what's canonical, what's superseded, what's corroborated—is genuinely hard. It requires judgment about temporal validity, source authority, and how later information updates earlier assertions.

This isn't ontology work. It's a different kind of problem: maintaining coherent beliefs as evidence accumulates.

Why This Matters for Builders

The demand side of this equation is clear. Aaron Levie of Box recently wrote that "in the 21st century, one of the most critical forms of competitive advantage will be a company's ability to capture, manage, and build processes around the right context." He's right. Enterprises are waking up to the fact that AI agents are only as good as the context they receive.

But if you're building context infrastructure, the framing matters because it determines where you spend your effort.

The "learned ontology" framing suggests you should focus on trajectory capture, embedding learning, emergent structure. Build the infrastructure for agents to discover ontology through use.

I'm suggesting something different: focus on what's actually unsolved.

Entity modeling? Adopt existing standards. Don't spend months defining what an Account is. Microsoft already licensed that from WAND.

Temporal validity? This is where the work is. Facts that change over time. The ability to query historical state. The event clock.

Decision traces? This is where the opportunity is. Being in the execution path. Capturing reasoning as data.

Fact resolution? This is where the hard AI problems live. Using LLMs to determine what's canonical, what's superseded, how to synthesize timeline facts from scattered observations.

Organizational learning? This is where trajectories genuinely help—but on top of a foundation, not instead of one.

The companies that win in context infrastructure won't be the ones who figure out how to learn ontology from scratch. They'll be the ones who adopt what exists and focus their innovation on what's genuinely new.

What We're Building

At Graphlit, this is the approach we've taken since 2021. What surprised us: the ontology question was never the bottleneck. We adopted Schema.org types early and never looked back. The hard problems were always elsewhere.

We use Schema.org types for entities—Person, Organization, Place, Event—serialized as JSON-LD. We're not religious about RDF or triplestores. We care about types and relationships. JSON with a @type field gives us Schema.org compatibility without requiring anyone to learn the semantic web stack.

What reality forced us to build was a hybrid storage system where entities live simultaneously in a vector store, a JSON metadata store, and a graph store. This lets us query across multiple axes at once: temporal (when was this true?), geospatial (where did this happen?), semantic (what's similar?), full-text (what mentions this term?), and graph (how does this connect?). Most systems pick one or two axes. Organizational knowledge exists across all of them simultaneously—a sales meeting happened at a specific time, in a specific city, about a specific account, with semantic content that relates to other conversations, connected to people and products in a relationship graph. You need to query all of that together.

We're building the temporal layer that existing ontologies lack. Facts with validAt and invalidAt. Status tracking: Canonical, Superseded, Corroborated, Synthesized. The ability to query "What was believed to be true at time T?"

We're extending into decision traces. Our MCP server sits in the context-retrieval path for agents. As agents execute workflows through Graphlit, we can capture not just the content they access but the reasoning patterns they exhibit.

And we're building fact resolution—LLM-powered infrastructure for determining what's currently true from accumulated assertions, synthesizing timeline facts from scattered observations, maintaining coherence as evidence accumulates.

The ontology foundation wasn't the hard part. Schema.org gave us that. The hard part is everything that comes after: time, decisions, resolution, learning. That's where we're focused.

The Semantic Web Finally Won

There's a through-line in this conversation that deserves acknowledgment.

The semantic web community—RDF, OWL, linked data—has been working on these problems for decades. Schema.org emerged from that tradition. The vision of machines reasoning over structured, interoperable data wasn't invented by the AI crowd in 2024.

Even the term "context graph" has academic lineage predating the current conversation. A June 2024 paper on arXiv formally introduces Context Graphs as an extension of knowledge graphs that incorporate temporal validity, geographic location, and source provenance—exactly the dimensions missing from triple-based representations. The researchers demonstrate that adding this contextual layer significantly improves reasoning performance on knowledge graph tasks. The concepts Foundation Capital popularized for enterprise AI were already being formalized in research.

What killed adoption wasn't the ideas. It was the syntax. RDF, SPARQL, OWL—the learning curve was brutal, and most developers never climbed it.

LLMs changed that. They can read JSON-LD natively. They understand Schema.org types from training data. They reason over structured data without needing custom query languages or formal inference engines.

The semantic web's concepts won. The syntax lost. And that's fine—JSON-LD gives you the benefits without the tax. The context graph conversation is the semantic web conversation resurfacing, with LLMs as the reasoning layer that finally makes it accessible.

Where We Stand

The prescribed vs learned dichotomy is too clean. The entities don't need to be learned—they need to be adopted. Schema.org, CDM, WAND. The work has been done.

What does need to be built is the temporal and decision layer. Facts that change over time. Reasoning captured as data. Resolution that maintains coherence. That's genuinely novel. That's where the opportunity lives.

Three principles:

1. Adopt foundations, don't reinvent them. The entity modeling problem is solved well enough. Don't spend innovation budget rediscovering it.

2. Build the temporal layer. Facts with validity periods. Queryable history. Time as a first-class dimension.

3. Focus learning on what's genuinely novel. Organizational patterns. Decision dynamics. The tacit knowledge specific to how this organization works. That's where trajectories help—on top of a foundation, not instead of one.

The context graph vision is achievable. The question is whether we build on what exists or waste years rediscovering it.

We know which approach we're taking.

This is the third in a series responding to the context graph conversation. Previous posts: "The Context Layer AI Agents Actually Need" and "Building the Event Clock".

Learn more at graphlit.com.

Appendix: AI-Native CRMs and the Learned Ontology Thesis

The "AI-native CRM" wave is worth examining as a real-world test of the learned ontology thesis. These startups are trying to shortcut traditional CRM configuration by inferring structure from usage. But when you look closely, they fall into two buckets:

Bucket A: Schema-evolving. Lightfield (from the Tome founders) is the most explicit about actually inferring and evolving data structures from unstructured inputs. They talk about "flexible, backfilled schemas" and "schematizing unstructured data according to the organization's needs." This is closest to "learned ontology" in the literal sense.

Bucket B: AI-assisted configuration. Most others are really "classic CRM model, but AI makes setup and hygiene feel zero-config":

Clarify markets "the CRM that configures itself"—name a field, and AI selects the type, writes instructions, and autofills it.
Forge AI says "describe your process and watch the CRM build itself."
Day.ai captures communications and structures them into CRM records automatically.

You still operate within recognizable objects (accounts, contacts, deals, activities). The AI reduces friction; it doesn't eliminate the underlying schema.

This distinction matters for the broader ontology debate. Even the most aggressive "learned" approaches still rely on stable foundational objects. The learning happens at the organizational layer—which fields matter, which relationships get used—not at the entity-type layer.