๐Ÿค Small Language Models (SLM) โ€” 2,000+ Models Under 10B Parameters

Complete guide to small language models for edge deployment, mobile apps, and cost-efficient production. All data from AI Models Catalog โ€” first-party data only.

2,002
Small Models
928
With Tool Calling
557
With Reasoning
48
Free SLMs
689
First-Party
๐Ÿ” Search All 4,587 Models โ†’

What Are Small Language Models?

Small Language Models (SLMs) are AI models with fewer than ~10 billion parameters, designed for efficiency, low latency, and deployment on resource-constrained hardware โ€” from smartphones to edge servers. They offer a practical alternative to large frontier models when cost, speed, or privacy matters.

Key advantages of SLMs:

Cheapest Small Models with Tool Calling

Best value SLMs for AI agents and tool-use workflows (first-party providers only):

Model Provider Input $/M Output $/M Context Reasoning
ling-2.6-flash ling $0.01 $0.03 262K
klusterai--Meta-Llama-3.1-8B-Instruct-Turbo klusterai $0.015 $0.02 131K
granite-4.0-h-micro ibm $0.017 $0.112 131K
llama-3.1-8b-instruct--fp-16 fireworks $0.02 $0.03 131K
schematron-3b fireworks $0.02 $0.05 131K

Free Small Language Models

48 small models available at zero cost โ€” perfect for prototyping and development:

Model Provider Context Tool Calling Reasoning
deepseek-r1-distill-llama-8b cerebras 131K โœ“
llama-4-scout-17b-16e-instruct cerebras 131K โœ“
qwen-2.5-32b cerebras 131K โœ“
gemma-4-26b-a4b-it auriko 262K โœ“
glm-4.5-flash auriko 200K โœ“
glm-4.6v-flash auriko 128K โœ“
baidu--ernie-4.5-0.3b aimlapi 120K โœ“

Small Models with Reasoning

557 small models with reasoning capabilities โ€” ideal for math, logic, and step-by-step problem solving:

Model Provider Input $/M Output $/M Context Tool Calling
qwen3.5-0.8b qwen $0.01 $0.05 262K
qwen3.5-2b qwen $0.02 $0.10 262K
qwen--qwen3-4b-fp8 fireworks $0.03 $0.03 128K
qwen3.5-4b qwen $0.03 $0.15 262K
deepseek-r1-distill-llama-8b cerebras Free Free 131K

Best SLMs by Use Case

๐Ÿค– AI Agents on a Budget

ling-2.6-flash ($0.01/$0.03/M) โ€” cheapest tool-calling model with 262K context. Perfect for high-volume agent workflows.

๐Ÿ“ฑ On-Device / Edge Deployment

Qwen3.5 0.8B โ€” ultra-compact reasoning model. Gemma 4 27B IT โ€” free with vision + tool calling.

๐Ÿ’ป Code Completion

bdc-coder ($0.01/$0.01/M) โ€” cheapest coding model. Qwen3 4B ($0.03/$0.15/M) โ€” open-source with reasoning.

๐Ÿงฎ Math & Reasoning

DeepSeek R1 Distill Llama 8B โ€” free reasoning model. Qwen3.5 0.8B ($0.01/$0.05/M) โ€” cheapest reasoning.

๐Ÿ’ฌ Chat & RAG

GPT-4.1-nano ($0.10/$0.40/M) โ€” fast, cheap, reliable. Qwen3 4B ($0.03/$0.15/M) โ€” open-source alternative.

SLM vs LLM: When to Choose Small

Factor Small Model (SLM) Large Model (LLM)
Cost per 1M tokens $0.01 โ€“ $0.20 $1 โ€“ $40
Latency (first token) 50 โ€“ 200ms 200 โ€“ 2000ms
Deployment On-device, edge, cloud Cloud only
Privacy Data stays on device Data sent to cloud
Customization Easy fine-tuning Expensive fine-tuning
Complex reasoning Good for simple tasks Superior for complex tasks
Best for High-volume, real-time, edge Complex, nuanced, creative

๐ŸŽฏ AI Model Picker

โšก GitHub Action