llama.cpp
Description: C++ implementation of LLaMA for efficient local inference
Website: https://github.com/ggerganov/llama.cpp
llama.cpp is a port of Facebookâs LLaMA model in C/C++. It enables running large language models locally with minimal dependencies.
Features
- Fast CPU inference
- Support for various quantization formats
- Cross-platform compatibility
- Minimal memory footprint
Installation
Download the latest version from GitHub and compile for your platform.