Modalities

Computer Vision

AI that can interpret images and video — recognising objects, reading text, detecting faces, or describing scenes.

01 ——

In plain English

Computer vision is the branch of AI focused on understanding visual information. It's how AI tools recognise faces, detect objects in self-driving cars, read documents, diagnose medical images, and analyse satellite imagery.

Common tasks:

Image classification — what's in this picture?
Object detection — where are the objects, and what are they?
Optical character recognition (OCR) — extract text from images
Image segmentation — outline each object precisely
Image generation — create new images from text descriptions

Modern computer vision mostly uses deep neural networks (CNNs and vision transformers). Multi-modal LLMs like GPT-4o and Claude can also "see" — you can paste an image and ask about it.

Computer vision powers many of the AI tools in this directory's image, video, and design categories.

02 ——