Data & retrieval

Knowledge Base

A structured collection of documents an AI system can search and quote — the source-of-truth corpus that grounds RAG and many AI agents.

01 ——

In plain English

A knowledge base, in AI tooling, is the indexed body of documents an LLM is allowed to draw from when answering. It's the "ground truth" half of retrieval-augmented generation: the model is told to base its answer on what's in the knowledge base, not its training data.

What goes in a knowledge base:

Internal docs (Notion, Confluence, Google Drive, SharePoint)
Product documentation and help articles
Support tickets and resolved cases
Policy and procedure manuals
Public web pages a company controls

How it gets indexed:

Documents are chunked into passages (typically 200–800 words)
Each chunk is embedded into a vector database
At query time, the system retrieves the top-K most relevant chunks
Chunks are passed to the LLM as context for the answer

Why it matters: Knowledge bases are how organisations safely use AI on private data — the model never trains on the docs, it just reads them at query time. Tools like Glean, Notion AI, Microsoft Copilot, and most enterprise chatbots are knowledge-base-driven.

02 ——