Modalities

OCR

Optical Character Recognition — AI that converts text inside images, scanned documents, or PDFs into editable, searchable text.

01 ——

In plain English

OCR (Optical Character Recognition) is the technology that reads text from images and turns it into actual characters a computer can search, copy, or edit. It's how scanned PDFs, photos of receipts, and screenshots become useful data.

Common uses:

  • Document digitisation — scan books, contracts, archives
  • Invoice and receipt processing — extract amounts, dates, vendors
  • ID and form capture — fill forms automatically from a photo
  • Accessibility — read images aloud for visually impaired users
  • Translation — translate signs or menus from a phone photo

Modern OCR: Traditional OCR used hand-crafted character templates. Modern OCR uses deep learning and works on messy, multilingual, hand-written, or low-quality images. Multi-modal LLMs like GPT-4o and Claude can also read images directly — often replacing dedicated OCR tools for complex documents.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI