Your Knowledge Base Is Rotting? AI Can Fix that.

Development | Denis Susac

Your Knowledge Base Is Rotting? AI Can Fix that.

Wednesday, Apr 22, 2026 • 10 min read

Why PKM systems fail and how the LLM Wiki pattern fixes the architectural root cause — with a layered retrieval design that goes beyond the RAG-vs-wiki false choice.

Every Personal Knowledge Management (PKM) system I’ve ever seen fails the same way, just not immediately. The first few months are great: you’re capturing notes, linking concepts, building something that feels like a second brain. Then life gets busy. The system starts accumulating debt. You open Obsidian three weeks later and there are 47 unprocessed inbox items, a folder called “misc” that’s become a dumping ground, and a web of notes that no longer connect to anything useful. You spend 45 minutes trying to fix it and give up.

Six months in, the system collapses. A year later, you’ve either started over or abandoned the whole thing.

This isn’t a discipline problem. It’s an architectural problem. Personal knowledge management tools were designed for humans to maintain, and maintaining a relational database in your spare time is not something humans actually do. We dump, we link, we forget, we move on.

Andrej Karpathy recently published a gist describing the “LLM Wiki” pattern, and when I read it I recognized what it was describing: a structural fix for this failure mode, one that treats the problem as architectural rather than behavioural.

What Karpathy Actually Proposed

The core idea is deceptively simple. Instead of searching raw documents at query time, you have an AI agent continuously synthesize your knowledge into a maintained wiki. When you add a new source (an article, a meeting note, a research paper), the agent reads it, extracts the meaningful claims, and integrates them into topic pages that already exist in the wiki. The wiki grows richer with every ingest. Search becomes a question you ask of synthesized, cross-linked knowledge, not a pile of raw files.

The contrast with standard RAG is real. RAG is stateless: every query starts from scratch, pulling chunks from raw documents and generating an answer that evaporates afterward. There’s no accumulation. The thousandth time you search a corpus, the system knows exactly as much as it did the first time. An LLM Wiki is stateful. Each new source extends and refines existing pages, and the value compounds over time.

I read the gist on a Friday and by Sunday evening had a working implementation inside my own Obsidian vault.

What “RAG Is Dead” Gets Wrong

Within a week of Karpathy’s post, the takes started appearing. “RAG is dead.” “The wiki pattern kills retrieval.” Articles framing this as a binary choice between two competing approaches.

That framing misses the point, and building this system made me certain of it.

The popular version of the contrast goes: “RAG retrieves at query time; wiki synthesizes upfront. RAG is fast to set up. Wiki is slower but better.” The second sentence is wrong in a way that matters.

RAG is not fast to set up. Building a production RAG pipeline at Dokko has taught me this directly. You need to chunk documents (a decision that dramatically affects retrieval quality, and for which there are no universal right answers), generate embeddings using a model you’ve chosen and configured, build and maintain a vector index, handle metadata filtering, solve multimodal challenges, manage re-indexing when documents change, and tune retrieval parameters to avoid the garbage-in-garbage-out problem on the retrieval end. A naive RAG system is fast to demo. A production RAG system has weeks of pipeline work before it answers its first question reliably.

The real distinction is what each approach pre-processes for. RAG pre-processes for mechanical retrieval: it converts text into vectors so similarity search can find relevant chunks quickly. It has no opinion about what those chunks mean or how they relate to each other. The LLM Wiki pre-processes for comprehension: it converts raw sources into synthesized pages that already contain the context, cross-links, and editorial curation that you would otherwise need to reconstruct at query time.

These are not competitors. They are layers, and the right architecture uses both.

Here’s the structure I ended up with:

Layer 1: Wiki pages. Pre-synthesized, cross-linked topic knowledge. If the wiki has coverage, the query stops here. Fastest path, highest signal-to-noise.

Layer 2: Raw vault files. The underlying source material. Fallback when the wiki doesn’t yet cover a topic. Every raw note is eventually a candidate for ingest.

Layer 3: LLM general knowledge. Fills gaps that exist neither in the wiki nor the vault. The model’s training data as a backstop.

Vector search sits between Layers 1 and 2 once the wiki grows past what a flat index can navigate efficiently. More on that shortly.

The wiki doesn’t replace retrieval. It sits in front of it, pre-answering the questions you ask repeatedly while raw files remain available for everything else. As coverage improves, you reach for Layer 2 less often.

What I Actually Built

The implementation lives in a _wiki/ directory at the root of a PARA-method vault managed by Claude Code agents. Here’s the structure:

_wiki/
  SCHEMA.md          # Rulebook: page formats, ingest workflow, evolution rules
  index.md           # Master catalog of all wiki pages (flat markdown table)
  log.md             # Append-only ingest history
  topics/            # Synthesized knowledge clusters (200-500 words each)
  concepts/          # Short definitions with cross-links (50-150 words)
  entities/          # Named things: tools, products, people (100-300 words)
  sources/           # Provenance manifests, one per ingested source
  analyses/          # Multi-source syntheses, created on request only

Every wiki page follows a consistent frontmatter schema:

---
title: "Contextual Retrieval"
type: concept
created: 2026-04-15
updated: 2026-04-17
sources:
  - "[[1-Projects/work/dokko/decisions/adr-007-contextual-retrieval-final-architecture]]"
related:
  - "[[_wiki/topics/backend-architecture]]"
  - "[[_wiki/concepts/rag-vs-wiki]]"
tags: [retrieval, ai, backend]
status: active
---

Followed by a consistent body structure: ## Overview, ## Key Points, ## Details, ## Sources, ## Open Questions. The structure is enforced by a SCHEMA.md rulebook at the wiki root. Every ingest operation begins with the agent reading that file.

The index.md file at the wiki root is the agent’s map of everything that exists. It’s four tables: topics (with status, last-updated date, source count, and a wikilink), concepts (with related topics), entities (with type and related topics), and sources (with the date ingested and which pages each source fed). A topic with status stub has frontmatter but no real content yet. A topic with status active has been fed at least one ingest. The source count column tells you at a glance how well-evidenced a topic is: a topic with seven sources behind it is more reliable than one with one. When you open the vault in Obsidian, this file is also a navigable index you can click through directly, since all paths are wikilinks.

The ingest workflow runs nine steps on every source:

Read the source file
Read _wiki/index.md to understand which pages already exist
Check _wiki/sources/ for re-ingest detection (same source processed before)
Extract 5 to 15 key claims, decisions, or insights worth retaining
Map claims to existing topics; propose a new topic only if 5+ claims share a theme not covered by any existing one
Create or update topic pages: append new claims, update the sources list in frontmatter, update the date
Create or update concept and entity pages for new terms or named things introduced by the source
Create a source manifest in _wiki/sources/<slug>.md recording what claims went where
Update _wiki/index.md and append a row to _wiki/log.md

The /ingest command in Claude Code handles this workflow. It accepts a vault path or a URL:

/ingest 2-Areas/health/reports/cardiovascular-strategy-2026-02.md
/ingest 1-Projects/work/dokko/decisions/adr-007.md --topic backend-architecture
/ingest https://martinfowler.com/articles/microservices.html
/ingest 3-Resources/tech/ai/karpathy-llm-wiki.md --dry-run

For URLs, it first saves the content to the vault’s 3-Resources/ folder with proper frontmatter (an archiving step), then runs the synthesis workflow on the saved file. --dry-run shows you the extracted claims and proposed topic changes without writing anything.

The final piece is wiki-first routing. Every Claude Code agent that touches the vault checks _wiki/index.md before searching raw files. If the wiki has coverage on a topic, raw search is skipped. Without this routing, you’ve built a synthesis layer that nobody queries. The routing is what makes the accumulation valuable.

What Karpathy Proposed That I Deliberately Didn’t Build

Karpathy’s gist describes the pattern at a conceptual level. Several people have since built fully automated implementations on top of it, including background daemons that watch for file changes and trigger ingest automatically.

I didn’t do that, and the reason is worth understanding.

The preview step above (step 5, where the agent shows you the extracted claims and proposed topic mappings before writing anything) is the most important part of the system. Without it, low-quality sources produce low-quality wiki pages that are hard to detect and remove. An automated ingest daemon that processes everything it sees would accumulate noise at the same rate it accumulates signal, and the wiki would degrade in exactly the same way that unmanaged note-taking systems degrade: gradually, invisibly, until you realize nothing in it can be trusted.

Making ingest a deliberate command rather than a background process is a quality gate. You decide that a source is worth synthesizing. That decision takes a few seconds and eliminates an entire category of failure.

Full automation is appealing in theory; in practice, supervised automation is more useful because the human review step is where quality actually gets enforced.

I also skipped a dedicated lint command. Several people building on this pattern have implemented /wiki-lint commands that check for stale pages, orphaned concepts, broken wikilinks, topics that have drifted from their stated scope. The problem with encoding this into a command is that “lint” means something different every time you run it. Sometimes you want to find topics that haven’t been updated in months. Sometimes you want to check whether two concepts have silently merged in practice but still have separate pages. Sometimes you want to flag topics whose source count is one and whose single source is over a year old. A fixed command can only check fixed things. Asking Claude directly lets you describe what you’re actually looking for. The wiki’s transparency (every page lists its sources and dates in frontmatter, the log records every ingest) means those ad-hoc queries work well without any dedicated tooling.

The other thing I didn’t implement is a separate archive of raw sources alongside the wiki. Karpathy’s design separates raw/ (immutable, AI never writes) from wiki/ (fully AI-managed). My vault already has this separation baked in through PARA: raw sources live in 1-Projects/, 2-Areas/, and 3-Resources/; the wiki is a compiled layer on top. The structural principle is the same, mapped onto a pre-existing organizational system rather than added alongside it.

What Broke, and What Surprised Me

The flat index degraded faster than expected. The _wiki/index.md file is a markdown table listing every page, its type, and its topic tags. That works at 30 pages. At 80 pages it’s manageable. Past 150 pages, scanning the full index on every query becomes slow and relevance degrades: the agent reads too much before identifying which pages are actually useful.

This is where vector search enters the picture. The plan is to sit a hybrid BM25/vector index between Layers 1 and 2, using something like QMD (Tobi Lütke’s local markdown search engine with a built-in MCP server) wired into the Claude Code session. Instead of scanning the full index, the agent embeds the query and retrieves the top-k most relevant wiki pages by combined keyword and semantic score. The synthesis layer remains unchanged. You’re adding smarter navigation of it as it grows.

The other surprise was how quickly the wiki became useful. I expected it to take weeks before coverage was broad enough to matter. It was useful within days, because the topics it covered first were the ones I cared about most: domains I’d been writing notes on for months that had never been synthesized into anything queryable. The cold-start problem is smaller than it looks if you ingest selectively rather than dumping everything in at once.

From Personal Experiment to Product Intuition

I work with a team that builds Dokko, an enterprise AI knowledge assistant. When I built this in my personal vault, I wasn’t thinking about product work. I was annoyed at my own decaying PKM system and wanted to fix it.

But the failure modes I hit at small scale map directly to what breaks at enterprise scale, with worse consequences and harder rollbacks.

A flat index that degrades at 150 pages is a manageable annoyance in a personal vault. In an enterprise system with 50,000 documents across 200 teams, naive retrieval isn’t just slow. It’s wrong: answers that are confident and plausible, assembled from chunks that share a keyword but not a meaning. The solution isn’t faster search. It’s pre-synthesizing knowledge so the context is already assembled before the query arrives. Dokko’s contextual retrieval pipeline exists because of exactly this insight: chunks without context are just text.

Multi-tenant isolation doesn’t exist as a problem in a single-user vault, but it’s central to any enterprise deployment. Keeping each team’s knowledge appropriately separate while allowing cross-team synthesis where it makes sense is a hard design problem. The underlying question is the same at both scales: what is the right granularity for a knowledge unit, and who is responsible for maintaining it?

The wiki approach makes the maintenance responsibility explicit. The agent maintains the wiki. The user maintains the source vault. Those are different jobs with different cadences. Conflating them is why most PKM systems fail: users are asked to do both and eventually do neither.

Where This Goes

The ecosystem around this pattern is growing quickly. Karpathy’s gist landed and within weeks there were a dozen implementations, experiments, and adjacent projects. The MCP ecosystem is making it straightforward to wire local knowledge graphs into AI environments without building custom integrations from scratch. Vector search is becoming cheap enough that a personal wiki with a proper retrieval index is no longer an infrastructure project.

The gap that most implementations haven’t closed is quality gates on ingest at the claim level. Right now, feeding a mediocre source into the wiki produces mediocre topic updates that are difficult to detect. What’s needed is a scoring pass that evaluates whether extracted claims are genuinely novel and relevant before they’re committed to topic pages. That would shift the system from “whatever the agent extracts” to one that actively improves its own signal-to-noise ratio over time. Nobody has built this cleanly yet.

If you run a personal knowledge system and it’s in the six-month decay phase, the LLM Wiki pattern is worth trying. It’s not elegant and it doesn’t solve everything, but it moves the maintenance burden from you to a machine, and machines are far more consistent about running nine-step workflows at 11pm on a Tuesday.