Llama 4

Description: Meta's latest open-source LLM generation with native multimodal capabilities

Llama 4 was released by Meta on April 5, 2025 and is the fourth generation of the Llama family. It is the first Llama model with Mixture-of-Experts (MoE) architecture and native multimodality.

Model Variants

Llama 4 Scout: 17B active parameters (16 experts), best multimodal model in its class, fits on one H100 GPU, supports 10M token context (industry’s longest)
Llama 4 Maverick: 17B active parameters (128 experts), surpasses GPT-4o and Gemini 2.0 Flash in many benchmarks
Llama 4 Behemoth Preview: 288B active parameters (16 experts), surpasses GPT-4.5, Claude Sonnet 3.7 and Gemini 2.0 Pro

Features

Native multimodal: text, images, video
Mixture-of-Experts architecture
Industry-leading context windows
Open permissive license

Download

Available on llama.com and Hugging Face. Can be run locally with llama.cpp, Ollama, or LM Studio.