‌
‌

Modalities

Multi-modal

An AI model that can understand and work with multiple types of input — text, images, audio, or video — not just text.

01 ——

In plain English

A multi-modal AI model can process more than one type of data. Instead of only reading text, it might also understand images, hear audio, or watch video — and combine all of those to generate a response.

Examples:

Text + image: Upload a photo of a broken pipe and ask "what's wrong here?"
Text + audio: Speak a question and get a spoken answer back
Text + video: Describe what's happening in a video clip

Why it matters: Most real-world information isn't just text. Multi-modal models can work with screenshots, diagrams, voice messages, PDFs with charts, and more — making them far more useful for everyday tasks.

GPT-4o, Claude 3, and Gemini Ultra are all multi-modal models.

02 ——

Related terms

Large Language Model — the type of AI behind tools like ChatGPT and Claude, trained to understand and generate text.

Computer Vision

AI that can interpret images and video — recognising objects, reading text, detecting faces, or describing scenes.

AI that generates new images from a written description — the technology behind tools like Midjourney, DALL-E, and Stable Diffusion.

AI that generates video clips from a text description — the next frontier after text-to-image, with rapidly improving quality.

AI that converts spoken audio into written text — the technology behind voice assistants, transcription tools, and meeting recorders.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure