Modalities

OCR

Optical Character Recognition — AI that converts text inside images, scanned documents, or PDFs into editable, searchable text.

01 ——

In plain English

OCR (Optical Character Recognition) is the technology that reads text from images and turns it into actual characters a computer can search, copy, or edit. It's how scanned PDFs, photos of receipts, and screenshots become useful data.

Common uses:

Document digitisation — scan books, contracts, archives
Invoice and receipt processing — extract amounts, dates, vendors
ID and form capture — fill forms automatically from a photo
Accessibility — read images aloud for visually impaired users
Translation — translate signs or menus from a phone photo

Modern OCR: Traditional OCR used hand-crafted character templates. Modern OCR uses deep learning and works on messy, multilingual, hand-written, or low-quality images. Multi-modal LLMs like GPT-4o and Claude can also read images directly — often replacing dedicated OCR tools for complex documents.

02 ——