Mac Mini M4 / M4 Pro

Description: Compact AI workstation with Unified Memory for local LLM inference

Website: https://www.apple.com/mac-mini/

The Mac Mini M4 is an excellent choice for local AI inference. With Unified Memory Architecture and M4 chip, it offers strong performance for LLMs with low power consumption.

Technical Specifications

Unified Memory: Up to 64GB shared memory for CPU and GPU (M4 Pro)
Memory Bandwidth: 120 GB/s for fast token generation
Metal Acceleration: GPU acceleration without complex driver installation
Energy Efficiency: Significantly lower than NVIDIA GPU setups

Performance

LLaMA 2/3 7B: ~12 tokens/second (32GB RAM)
LLaMA 3.1 8B quantized: ~28 tokens/second
Model size: Efficient up to ~10 billion parameters

Benefits

From $599 for base model
No VRAM limitation thanks to Unified Memory
Perfect with Ollama and llama.cpp
Quiet, cool operation