- Published on
Knowledge Graphs for AI: Ontologies, Semantic Layers & GraphRAG Explained
Knowledge is of no value unless you put it into practice. — Anton Chekhov
In college, I took a course called IF15: Knowledge Engineering. That's when I heard the word ontology for the first time in my life. It would pop up occasionally in papers and articles I read, but I never took the time to dig deeper—I never felt the need, the necessity.
This term has come back in force over the past few months with the GenAI boom, and especially with the realization that classic RAG is failing and the rise of an alternative—or rather, a helper—GraphRAG. GraphRAG relies on the notion of Knowledge Graphs, which is deeply connected to the concepts of Ontologies.
Originally, this article was just going to be an introduction to ontologies. Then I realized how irrelevant it would be to stay narrowly scoped on that single concept. So I reoriented the article toward knowledge in general—the broader picture.
The thread running through this article: how do we structure what we know so that both machines and humans can understand it?
What is Knowledge?

The Triptych: Data → Information → Knowledge
Before talking about how to model knowledge, let's define what it is.
We generally distinguish three levels:
- Data: Raw facts, without context.
42,"Paris",2024-01-15. - Information: Contextualized data. "The customer ordered 42 units in Paris on January 15, 2024."
- Knowledge: Information usable for action or decision-making. "Paris orders increase in January—we need to anticipate stock levels."
Knowledge is therefore information used in a given context to solve a problem or make a decision (thks to my UTT course).
Knowledge Engineering
My IF15 course defined knowledge engineering as:
An approach that collects and structures reasoning. Its objective is to formalize problem-solving—the approach followed by one or more experts to solve a problem.
In other words: externalize the knowledge produced "in" and "for" a domain, and make it exploitable.
At the time, I found it very theoretical, almost boring. Today, with agents that need to "understand" our data to generate SQL queries or answer business questions, this discipline makes complete sense.
The Two Faces of Metadata
When we talk about knowledge in enterprises, we're essentially talking about metadata—data about our data. This metadata divides into two fundamental categories:

Domain Knowledge (Business Knowledge)
This is what the business knows about its domain:
- Business concepts and jargon: What is "churn"? "MRR"? A "qualified lead"?
- Glossaries and definitions: How do we calculate revenue? Gross or net?
- Acronyms and synonyms: WC = World Cup, ARR = Annual Recurring Revenue, CMR = Cameroon
Structural Knowledge (Technical Knowledge)
This is what the data knows about itself:
- Relationships between elements: Which tables can be joined? On which keys?
- Dependencies: If I modify this column, what breaks?
- Lineage: Where does this data come from? What transformations has it undergone?
These two types of knowledge are complementary. Domain knowledge says "the business talks about revenue", structural knowledge says "revenue is in fact_sales.amount". Without the mapping between the two, it's impossible to translate a business question into a technical query.
Garbage In, Garbage Out
We all know this principle in Machine Learning: if your training data is bad, your model will be bad.
This principle applies exactly to GenAI and Agents, but with an important nuance: for GenAI, the "garbage" we're talking about is mostly primarily the metadata.
When you want to do a text-to-SQL project for example, the heart lies in the metadata—the description of the data you have in your possession.
Very often, companies rush directly into the AI layer, GenAI—either to follow the trend or because they think that's where the difficulty lies. But not at all. The difficulty is upstream: in the quality and completeness of metadata (and obviously data, but this is normally already well known...).
Investing heavily in sophisticated models without investing in metadata is building a house on sand.
How to Model Knowledge?
There are several ways to structure knowledge, with different levels of sophistication. These are called Knowledge Management Structures.
The choice depends on the use case, the scale, and especially who will consume this knowledge. They can obviously be combined depending on the use cases.
List (Controlled Vocabulary)

The most basic form of structuring.
It is A simple enumeration of possible values, with no relationships between them.
It is Flat, non-hierarchical structure, with no semantics beyond belonging to the list.
Examples: list of countries: France, Germany, Spain..., list of genders: Male, Female, Non-binary, list of order statuses: Pending, Shipped, Delivered, Cancelled
This is Useful for constraining values, but captures no relationships or meaning.
Taxonomy

We step up by introducing hierarchy.
A Taxonomy is roughly a hierarchical classification based on parent-child relationships: Single relationship: IS-A. Taxonomies are Tree-like, from general to specific.
Examples: A car IS-A vehicle, An SUV IS-A car, An SUV IS-A vehicle (by transitivity)
What is great here, is the introduction of conceptual hierarchy. You can navigate from general to particular and vice versa, Although, only one relationship possible. You can't say that a car belongs to someone or is manufactured by a brand...
Thesaurus

The thesaurus enriches taxonomy with synonymy and generic relationships.
They are Taxonomies augmented with equivalence and association links.
Relationships:
- IS-A (inherited from taxonomy)
- SYNONYM-OF: Car ↔ Automobile ↔ Auto
- RELATED-TO: Car ↔ Road, Car ↔ Driver
So, they help handling linguistic ambiguity. When a user searches for "auto", we also find "car".
Typical usage: Search engines, indexing systems, navigation aids.
Semantic Layer
The semantic layer is a related concept that has had a lot of influence in the data ecosystem, with tools like DataHub, dbt Semantic Layer, or Tableau/PowerBI data models.
They Pre-calculated logical views on data, defining business metrics and concepts. So, they are, by-design, Hard-coded and static information. They are often scoped to a tool (Tableau, PowerBI, dbt) and more like "Semantic Views" than true semantics
Concrete example:
metrics:
- name: revenue
description: 'Total revenue from completed orders'
type: sum
sql: amount
filters:
- status = 'completed'
Despite their theoretical importance, semantic layers remain marginal with clients. Very few companies actually have a mature semantic layer. And when it exists, it's often limited to a specific tool.
The semantic layer references recurring information but doesn't allow generating new knowledge. It's static—you define "revenue," but you can't dynamically ask "which metrics are related to revenue?"
Ontology

Ontology is the major qualitative leap. We move from static to dynamic.
They are formal structure allowing rich, typed, and semantic relationships, unlimited and explicit (MARRIED-TO, WORKS-FOR, MANUFACTURED-BY, LOCATED-IN, PURCHASED,...).
Structure:
- Classes: Abstract concepts (Person, Product, Company)
- Subclasses: Specializations (Employee IS-A Person)
- Instances: Concrete entities representing real facts (John Smith, iPhone 15)
- Axioms: Rules and constraints ("An employee can only work for one company at a time")
- Properties: Attributes of classes (Person has an age, a name...)
They are many standard out there: RDF, OWL, SPARQL, we used 3 of them at UTT, lol, and we worked with a tool named (protégé)[https://protege.stanford.edu/] (read it in french please)
The ontology is by-design traversable. You can query it to infer new information that wasn't explicitly declared.
Example: If John WORKS-FOR Acme, and Acme LOCATED-IN Paris, then we can infer that John works in Paris—even if this fact isn't directly stored.
Knowledge Graph

The Knowledge Graph is mainly, is my current understanding, the concrete implementation of an ontology.
A graph of structured data where entities (Nodes) are connected by typed relationships (Edges). Simple, Basique.
Nodes are entities (people, products, concepts...) and Edges are labeled and directional relationships
Why and For Whom Should We Model Knowledge?
Structured knowledge has three types of consumers, each with specific needs.
For Humans
For humans, structured knowledge is invaluable across roles: data analysts, analytics engineers, and data scientists benefit first, gaining the context to interpret fields like status_cd, understanding how tables can be joined, and correctly discerning whether a negative amount signals a refund or an error—in the absence of clear documentation, newcomers are forced to relearn what was already known. Stakeholders and business users rely on a common language to avoid ambiguity: a shared glossary ensures that everyone understands terms like "churn" and calculates KPIs, such as "revenue," using consistent logic, while enabling cross-team communication so that Marketing and Finance speak the same language. Operational and data engineers, along with new team members, need living documentation to grasp data processes, accelerating onboarding so that the information system becomes navigable in days rather than months, and facilitating traceability and audit by making it clear where numbers come from and how calculations happen.
Agents
This is where it gets really interesting. Let's direclty take the really common Text-to-insights agent use case.
The Text-to-insights Challenge
Everyone wants to chat with their data, but everyone is not ready to do what is necessary.
Whether the data is in a Data Lake, a Data Warehouse, a simple relational database, the problem is the same: translating a business question into a technical query.
To achieve this, the agent must be able to:
- Map business concepts → "revenue" corresponds to which column?
- Understand values → "World Cup" is the code
WCorWORLD_CUP? - Know the joins → How do you link
customerstoorders? - Respect business rules → Is revenue calculated before or after tax?
What Agents Need
Concretely, a performant Text-to-insights agent needs:
| Element | Description | Example |
|---|---|---|
| Glossary | Concept → technical mapping | "revenue" = SUM(orders.amount) |
| Enriched schema | Tables + columns + descriptions | status_cd: Status code (A=Active, I=Inactive) |
| Joins | Relationships between tables | orders.customer_id → customers.id |
| Validated examples | Question/SQL pairs | "Top 10 customers" → SELECT... |
| Business rules | Constraints and calculations | Revenue = amount before tax, excluding cancellations |
The Measured Impact
This isn't theory. Research (notably from LinkedIn and Snowflake on Cortex) has quantified the impact of metadata on the quality of generated queries.
The difference between an agent that hallucinates non-existent columns and an agent that produces correct queries? The quality of metadata provided in context.
Why Does This Matter Now?
The Return of Knowledge Engineering
The term "ontology" has been experiencing a resurgence over the past year. This is no coincidence: it's directly correlated with the rise of GenAI.
The first peak of interest in ontologies was corelated with the big data boom, the second one with the GenAI one.
It reminds me of my university courses, courses I found sometimes boring. Those courses are getting their revenge.
The Failure of Classic RAG
Classic RAG (Retrieval-Augmented Generation) works like this:
- Split documents into chunks
- Vectorize these chunks
- Retrieve chunks similar to the question
- Inject them into the LLM prompt
We inject raw context—pieces of text without structure. It's sufficient for simple factual questions ("What is the refund policy?"), but insufficient for complex reasoning ("Which customers are at risk of churning next month?").
Classic RAG is a Raw Context Retriever. It retrieves text, not knowledge.
From Retrieval to Reasoning

| RAG Today | RAG Tomorrow | |
|---|---|---|
| Full name | Retrieval Augmented Generation | Reasoning Augmented Generation |
| Input | Raw text chunks | Structured knowledge |
| Method | Vector similarity | Vector similarity + Graph traversal |
| Capability | Finding facts | Inferring insights |
Raw context is interesting for facts, but it's even more impactful to be able to reason over existing knowledge in a domain.
This is where Knowledge Graphs and ontologies come into play. They allow agents to Navigate through knowledge (not just retrieve it), Infer non-explicit facts, Reason about relationships between concepts
The Evidence for Enterprises
It has become obvious that agents need a structured way to understand reasoning processes.
- Investing in the AI layer without investing in metadata = predictable failure
- Output quality is determined by input quality (garbage in, garbage out)
- Knowledge Management is no longer a nice-to-have, it's a prerequisite
The good news: You don't need to do everything at once.
Where to Start?
- Start small: A CSV file with a glossary of business terms
- Document key tables: The most queried ones first
- Describe columns: Possible values, meaning, usage patterns
- Map joins: Relationships between main tables
- Collect examples: Question/SQL pairs validated by humans
Perfection is not required. Progress is.
This metadata can be AI-assisted: take samples from your tables, pass them to an LLM to generate descriptions, then manually validate and adjust. It's tedious work, but it's the work that makes the difference between a POC that impresses and an agent that delivers value in production.
Ontology is not (only) a dusty academic concept. It's the foundation on which tomorrow's agents will be able to reason—not just retrieve text.
Classic RAG has shown its limits. GraphRAG and Knowledge Graph-based approaches point toward the future: systems that understand the structure of knowledge, not just its textual content.
For enterprises, the message is clear: before investing in the latest trendy use cases/tools, invest in your metadata. Document your tables. Define your concepts. Map your relationships.
It's less sexy than a new tool, but it's what will make the difference between an agent that hallucinates and an agent that reasons.
PA,