Data & retrieval

Knowledge Base

A structured collection of documents an AI system can search and quote — the source-of-truth corpus that grounds RAG and many AI agents.

01 ——

In plain English

A knowledge base, in AI tooling, is the indexed body of documents an LLM is allowed to draw from when answering. It's the "ground truth" half of retrieval-augmented generation: the model is told to base its answer on what's in the knowledge base, not its training data.

What goes in a knowledge base:

  • Internal docs (Notion, Confluence, Google Drive, SharePoint)
  • Product documentation and help articles
  • Support tickets and resolved cases
  • Policy and procedure manuals
  • Public web pages a company controls

How it gets indexed:

  1. Documents are chunked into passages (typically 200–800 words)
  2. Each chunk is embedded into a vector database
  3. At query time, the system retrieves the top-K most relevant chunks
  4. Chunks are passed to the LLM as context for the answer

Why it matters: Knowledge bases are how organisations safely use AI on private data — the model never trains on the docs, it just reads them at query time. Tools like Glean, Notion AI, Microsoft Copilot, and most enterprise chatbots are knowledge-base-driven.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI