All articles
Knowledge & RAG·January 4, 2026·8 min read

GraphRAG in One Flag: From Chunks to an Entity Graph

Plain vector RAG retrieves isolated chunks and misses facts in the relationships between them. Matrix builds an entity graph from one flag.

By Matrix Team

Vector RAG has a blind spot, and it's structural. You embed your documents, you embed the query, you pull back the top-k nearest chunks. That works beautifully when the answer lives inside a chunk. It fails the moment the answer lives in the relationship between two chunks that don't mention each other.

Ask "which suppliers are affected if the Hyderabad warehouse goes down?" and plain similarity search retrieves the chunk about the warehouse and, separately, the chunks that happen to share the most tokens with your query. The chunk that names a supplier three documents away — connected only because both reference the same shipping route — never surfaces. There's no cosine distance for is connected to.

GraphRAG closes that gap by building an explicit entity-and-relation graph at ingestion time, then walking it at retrieval time. In Matrix, you turn it on with a single flag: Knowledge.graphragEnabled=true. This post walks through exactly what that flag does to your data and your retrieval path.

If you haven't seen how the base RAG pipeline works, start with RAG You Set Up by Dragging a PDF Into a Browser — GraphRAG layers on top of it, not beside it.

The blind spot, concretely

A corpus is a pile of chunks. Standard retrieval treats each chunk as an island: the only way two chunks "relate" is by being independently similar to the same query. That's enough for single-hop questions ("what is X?") and useless for multi-hop ones ("what does X imply for Y?").

The information you're missing isn't in the chunks. It's in the edges between the entities the chunks mention. Vector search has no representation for those edges. So you have two options: stuff more chunks into context and hope the model connects them, or build the edges yourself.

GraphRAG builds the edges yourself — automatically, at ingest.

What the flag actually does

Knowledge corpora in Matrix carry a graphragEnabled flag alongside the usual chunkSize / chunkOverlap knobs:

POST /api/orgs/{slug}/knowledge
{
  "properties": {
    "key": "ops-playbook",
    "name": "Operations Playbook",
    "kind": "FILES",
    "graphragEnabled": "true"
  }
}

With the flag off, ingestion is the standard path: parse the file, chunk at ~2000 chars with 200-char overlap, embed each chunk (text-embedding-004, 768d), write a KnowledgeChunk row with its embedding onto the underlying :Entity node. Done.

With the flag on, every chunk gets one extra pass. After embedding, GraphragExtractor runs an LLM extraction step that pulls entities and relations out of the chunk text. That's the whole difference at ingest — one more model call per chunk.

This is opt-in per corpus. A knowledge base you only ever ask single-hop questions of doesn't need it, and shouldn't pay for it. Flip it on for the corpora where the relationships carry the answers.

Extraction: strict JSON in, graph out

The extractor doesn't free-text its way to a graph. GraphragExtractor calls VertexTextClient.generateForStructuredOutput with a strict JSON-out prompt, so the model is constrained to return exactly this shape per chunk:

{
  "entities": [
    { "name": "Hyderabad Warehouse", "type": "Facility",
      "description": "Primary regional distribution hub" }
  ],
  "relations": [
    { "source": "Hyderabad Warehouse", "target": "Acme Logistics",
      "relation": "ships_via", "confidence": "EXTRACTED" }
  ]
}

Two things matter here. First, generateForStructuredOutput is the right call for this job: it pins maxOutputTokens and zeroes the thinking budget, so Gemini spends its budget emitting the structure rather than burning it on reasoning tokens — a trap the default generate() falls into. Second, the output is a typed graph fragment, not prose. Each entity has a name, a type, and a description; each relation has a source, a target, a relation verb, and a confidence.

That confidence value is one of three levels:

  • EXTRACTED — the relation was stated in the text.
  • INFERRED — the model deduced it from context.
  • AMBIGUOUS — plausible but uncertain.

Keeping confidence as a first-class property means a downstream consumer can decide how much to trust an INFERRED edge versus an EXTRACTED one, instead of treating every edge as gospel.

Merging entities across chunks

The same entity shows up in chunk 4, chunk 19, and chunk 200 — under slightly different surface forms, the same canonical name, whatever. If each mention spawned its own node, you'd have a graph of duplicates and no real connectivity.

Matrix upserts entities as KgEntity rows keyed on (knowledgeId, lower-cased name). So "Hyderabad Warehouse", "hyderabad warehouse", and "Hyderabad warehouse" from three different chunks all collapse onto one node, scoped to that corpus. That key is what turns a bag of per-chunk extractions into a single connected graph — the merge is what makes multi-hop reachability possible at all.

Edges are native Neo4j relationships

Here's a deliberate design choice worth calling out. Matrix models everything as generic EntityType / EntityNode rows — but the GraphRAG edges are not modelled that way. They're native Neo4j relationships:

// an entity is mentioned in a source chunk
(:KgEntity)-[:MENTIONED_IN]->(:KnowledgeChunk)

// two entities relate, with the extracted verb + confidence on the edge
(:KgEntity)-[:RELATES_TO { relation: "ships_via",
                           confidence: "EXTRACTED" }]->(:KgEntity)

Why native edges instead of the entity-ref pattern used elsewhere? Because traversal is the whole point. RELATES_TO and MENTIONED_IN are graph edges you walk, not foreign keys you join — and Neo4j walks native relationships far faster than it resolves indirection through ref nodes. The MENTIONED_IN edges are the bridge back to retrievable text: an entity points at every chunk that mentioned it, so once you've reached an entity by traversal, you can pull the chunks it lives in.

Retrieval: seed, then walk one hop

At query time, GraphRAG doesn't replace vector search — it extends it. The flow:

  1. Seed with vectors. Run the normal corpus-scoped, exact-cosine search to get the top chunks for the query. These are your seed chunks.
  2. Walk the graph. KnowledgeSearchTool calls KgEntityStore.expandFromChunks(seedChunkIds). That walks one hop out from the entities mentioned in the seed chunks, following RELATES_TO edges to neighbouring entities, then back down their MENTIONED_IN edges to the chunks those entities live in.
  3. Surface the reachable chunks. Chunks you'd never have retrieved by similarity — because they share no tokens with the query — now surface, because they're one relationship away from something that did.

That's the supplier chunk from the opening example. It never matched the query directly. It matched because the warehouse entity in a seed chunk has a ships_via edge to the supplier entity, and that supplier entity is mentioned in the chunk you needed.

Explainability is built in

A retrieval system that surfaces chunks "because the graph said so" is a debugging nightmare unless it tells you why. Each graph-hit chunk is tagged with its provenance:

graph: via Hyderabad Warehouse (ships_via)

So when a graph-reached chunk shows up in the agent's context, the via tag names the entity and the relation that bridged to it. You can read the retrieval trace and see the exact hop: this chunk came in via the Hyderabad Warehouse entity along its ships_via edge. No black box — the path that justified each chunk is right there in the result.

The cost trade-off — say it plainly

GraphRAG is not free, and the cost is concentrated at ingest. The flag adds one LLM extraction pass per chunk. A 500-chunk PDF means 500 extra structured-output calls before the corpus is queryable. That's latency on upload and tokens on your provider bill, paid once per ingest.

The trade is straightforward: you pay an ingest-time premium to make multi-hop retrieval possible at query time. For corpora where the answers genuinely live in the relationships, that's a bargain — the alternative is stuffing far more chunks into every prompt and hoping the model connects dots that were never linked. For corpora where every question is single-hop, leave the flag off and keep ingest cheap. That's exactly why it's per-corpus and opt-in rather than a platform-wide default.

How an agent uses it

Nothing about attaching the corpus to an agent changes. As with any Knowledge in Matrix, attaching a corpus auto-wires a search_knowledge tool onto the agent — no plumbing, no per-agent retrieval config. When the corpus has graphragEnabled=true, that same tool transparently runs the seed-and-walk path and returns the graph-reached chunks alongside the vector hits, each carrying its via tag.

The agent doesn't know or care that GraphRAG is on. It calls one tool and gets better context back. If you want the full picture of how retrieval wires itself onto agents with zero glue code, read Auto-Wired Retrieval: Your Agent Shouldn't Need RAG Plumbing.

Takeaway

Plain vector RAG retrieves chunks; GraphRAG retrieves chunks plus the chunks they're connected to. The connection comes from an entity/relation graph that Matrix builds during ingestion — strict-JSON extraction per chunk, entities merged across chunks by lower-cased name, edges stored as native Neo4j relationships you can actually walk. Retrieval seeds with vectors, walks one hop, and tags every graph-hit with the entity and relation that reached it.

The cost is one extraction pass per chunk at ingest, paid once. The control is a single per-corpus flag. Turn it on where the answers live in the relationships; leave it off where they don't.

Build it

Spin up a workspace, create a Knowledge corpus with graphragEnabled=true, drag in a document where the facts span multiple sections, and ask it a multi-hop question. Watch the via: tags show you the hops. Create a workspace and turn the flag on.

#graphrag#knowledge graph#entity extraction#retrieval

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Keep reading