Kimi K2.5
Description: 1 Trillion parameter MoE model with visual capabilities and Agent Swarm
Website: https://www.kimi.com
Kimi K2.5 is a state-of-the-art multimodal AI model from Moonshot AI with Mixture-of-Experts (MoE) architecture. With 1 trillion parameters (32B activated per token) and 256K token context window, it offers native visual capabilities, code generation, and parallel agent execution.
Technical Specifications
- Architecture: Mixture-of-Experts (MoE) with 384 experts (8 activated per token)
- Parameters: 1 trillion total, 32 billion activated per token
- Context Window: 256,000 tokens
- Vision Encoder: MoonViT with 400M parameters
- Training: ~15 trillion mixed visual and text tokens
- Quantization: Native INT4 support
- Layers: 61 layers (1 dense layer)
Special Capabilities
Native Multimodality: Pre-trained on mixed visual and text data for true cross-modal understanding. Processes text, images and videos seamlessly.
Visual Coding: Generates production-ready frontend code directly from text, image and video inputs. Supports interactive layouts and animations.
Agent Swarm (Beta): Coordinates up to 100 parallel sub-agents executing up to 1,500 tool calls simultaneously. Reduces execution time for complex tasks by up to 4.5x.
Multiple Modes: Available as Instant, Thinking, Agent and Agent Swarm (Beta) modes for different use cases.
Performance Benchmarks
- AIME 2025: 96.1 (math and logical reasoning)
- SWE-Bench Verified: 76.8 (software engineering)
- MathVista (mini): 90.1 (visual mathematical understanding)
- OCRBench: 92.3 (optical character recognition)
Local Execution
K2.5 can be run locally with inference engines like vLLM, SGLang and KTransformers. Requires transformers ≥ 4.57.1. Native INT4 quantization enables more efficient use on consumer hardware.
Use Cases
- Visual code generation: Frontend development from screenshots and mockups
- Complex task automation: Parallel agents for multi-step workflows
- Multimodal analysis: Process documents, images and videos simultaneously
- Deep research: Comprehensive research with automatic tool use
Available via kimi.com as cloud service and as open-source model on Hugging Face.