🤏 Small Language Models (SLM) — 2,000+ Models Under 10B Parameters

Complete guide to small language models for edge deployment, mobile apps, and cost-efficient production. All data from AI Models Catalog — first-party data only.

2,002

Small Models

928

With Tool Calling

557

With Reasoning

Free SLMs

689

First-Party

🔍 Search All 4,587 Models →

What Are Small Language Models?

Small Language Models (SLMs) are AI models with fewer than ~10 billion parameters, designed for efficiency, low latency, and deployment on resource-constrained hardware — from smartphones to edge servers. They offer a practical alternative to large frontier models when cost, speed, or privacy matters.

Key advantages of SLMs:

Lower cost — often 10-100x cheaper per token than frontier models
Lower latency — faster inference for real-time applications
Edge deployment — run on-device without cloud dependency
Privacy — data never leaves the device
Fine-tuning — easier to customize for specific domains

Cheapest Small Models with Tool Calling

Best value SLMs for AI agents and tool-use workflows (first-party providers only):

Model	Provider	Input $/M	Output $/M	Context
ling-2.6-flash	ling	$0.01	$0.03	262K
klusterai--Meta-Llama-3.1-8B-Instruct-Turbo	klusterai	$0.015	$0.02	131K
granite-4.0-h-micro	ibm	$0.017	$0.112	131K
llama-3.1-8b-instruct--fp-16	fireworks	$0.02	$0.03	131K
schematron-3b	fireworks	$0.02	$0.05	131K

Free Small Language Models

48 small models available at zero cost — perfect for prototyping and development:

Model	Provider	Context	Tool Calling	Reasoning
deepseek-r1-distill-llama-8b	cerebras	131K		✓
llama-4-scout-17b-16e-instruct	cerebras	131K	✓
qwen-2.5-32b	cerebras	131K	✓
gemma-4-26b-a4b-it	auriko	262K	✓
glm-4.5-flash	auriko	200K	✓
glm-4.6v-flash	auriko	128K	✓
baidu--ernie-4.5-0.3b	aimlapi	120K	✓

Small Models with Reasoning

557 small models with reasoning capabilities — ideal for math, logic, and step-by-step problem solving:

Model	Provider	Input $/M	Output $/M	Context
qwen3.5-0.8b	qwen	$0.01	$0.05	262K
qwen3.5-2b	qwen	$0.02	$0.10	262K
qwen--qwen3-4b-fp8	fireworks	$0.03	$0.03	128K
qwen3.5-4b	qwen	$0.03	$0.15	262K
deepseek-r1-distill-llama-8b	cerebras	Free	Free	131K

Best SLMs by Use Case

🤖 AI Agents on a Budget

ling-2.6-flash ($0.01/$0.03/M) — cheapest tool-calling model with 262K context. Perfect for high-volume agent workflows.

📱 On-Device / Edge Deployment

Qwen3.5 0.8B — ultra-compact reasoning model. Gemma 4 27B IT — free with vision + tool calling.

💻 Code Completion

bdc-coder ($0.01/$0.01/M) — cheapest coding model. Qwen3 4B ($0.03/$0.15/M) — open-source with reasoning.

🧮 Math & Reasoning

DeepSeek R1 Distill Llama 8B — free reasoning model. Qwen3.5 0.8B ($0.01/$0.05/M) — cheapest reasoning.

💬 Chat & RAG

GPT-4.1-nano ($0.10/$0.40/M) — fast, cheap, reliable. Qwen3 4B ($0.03/$0.15/M) — open-source alternative.

SLM vs LLM: When to Choose Small

Factor	Small Model (SLM)	Large Model (LLM)
Cost per 1M tokens	$0.01 – $0.20	$1 – $40
Latency (first token)	50 – 200ms	200 – 2000ms
Deployment	On-device, edge, cloud	Cloud only
Privacy	Data stays on device	Data sent to cloud
Customization	Easy fine-tuning	Expensive fine-tuning
Complex reasoning	Good for simple tasks	Superior for complex tasks
Best for	High-volume, real-time, edge	Complex, nuanced, creative

🎯 AI Model Picker

⚡ GitHub Action