DeepSeek V3 & R1
Description: Chinese open-source MoE models with excellent reasoning performance
Website: https://deepseeksr1.com
DeepSeek has released two impressive open-source models under MIT license with V3 and R1 that compete with leading proprietary models.
DeepSeek-V3
- Parameters: 671 billion (MoE), ~37B active per token
- Context: Up to 128K tokens
- Training: ~14.8 trillion diverse tokens
- Efficiency: Only 2.788M H800 GPU-hours for training
Available in V3.1 (Dual-Mode Thinking) and V3.2-Exp (DeepSeek Sparse Attention).
DeepSeek-R1
Specialized reasoning model trained with reinforcement learning:
- Sizes: Distilled versions from 1.5B to 70B
- Application: Step-by-step reasoning for math, logic, coding
- Use cases: Tutors, research assistants, debugging
Availability
MIT license for self-hosting. Available on Hugging Face and GitHub. Web interface, mobile apps and developer API.
Highlight
Comparable performance to proprietary top models with full openness.