Computer Vision
AI that can interpret images and video — recognising objects, reading text, detecting faces, or describing scenes.
In plain English
Computer vision is the branch of AI focused on understanding visual information. It's how AI tools recognise faces, detect objects in self-driving cars, read documents, diagnose medical images, and analyse satellite imagery.
Common tasks:
- Image classification — what's in this picture?
- Object detection — where are the objects, and what are they?
- Optical character recognition (OCR) — extract text from images
- Image segmentation — outline each object precisely
- Image generation — create new images from text descriptions
Modern computer vision mostly uses deep neural networks (CNNs and vision transformers). Multi-modal LLMs like GPT-4o and Claude can also "see" — you can paste an image and ask about it.
Computer vision powers many of the AI tools in this directory's image, video, and design categories.