Compare the top AI models for image generation — DALL·E, Imagen, GPT-5 Image, Gemini, and more. Real pricing and capabilities from first-party data.
Purpose-built models for text-to-image generation. Best for art, design, and visual content creation.
| Model | Provider | Type | Key Feature |
|---|---|---|---|
| imagen-4.0-generate | Text → Image | Latest Imagen, highest quality | |
| imagen-4.0-fast-generate | Text → Image | Fast generation, lower cost | |
| imagen-3.0-generate | Text → Image | Stable v3, production-ready | |
| imagen-3.0-fast-generate | Text → Image | Fast v3 variant | |
| dall-e-3 | openai | Text → Image | Best prompt adherence, DALL·E quality |
| dall-e-2 | openai | Text → Image | Lower cost, good for simple images |
| step-2x-large | stepfun | Text → Image | High-quality Chinese + English |
| step-1x-medium | stepfun | Text → Image | Mid-tier, good balance |
| step-1x-edit | stepfun | Image Edit | Edit existing images |
| step-image-edit-2 | stepfun | Image Edit | Advanced editing v2 |
| image-01 | minimax | Text → Image | MiniMax image generation |
| image-01-live | minimax | Text → Image | Real-time generation |
Multimodal chat models that can generate images within a conversation. Best for agents and interactive applications.
| Model | Provider | Input $/1M | Output $/1M | Context | Tool Call | Reasoning |
|---|---|---|---|---|---|---|
| gpt-5-image-mini | openrouter | $2.50 | $2 | 400K | ✅ | |
| gemini-3.1-flash-image | fastrouter | $0.25 | $1.50 | 65K | ✅ | |
| gemini-2.5-flash-image | fastrouter | $0.30 | $2.50 | 32K | ||
| gemini-3.1-flash-image | auriko | $0.50 | $3 | 65K | ✅ | |
| gemini-2.5-flash-image | auriko | $0.30 | $0.04 | 32K | ||
| amazon-nova-2.0-omni | amazon | $0.20 | $1.30 | 64K | ✅ | ✅ |
| gpt-5-image | openrouter | $10 | $10 | 400K | ✅ | |
| gpt-5.4-image-2 | openrouter | $8 | $15 | 272K | ✅ | |
| gemini-3-pro-image | fastrouter | $2 | $12 | 65K | ||
| gemini-3-pro-image | auriko | $2 | $12 | 131K | ✅ |
Most affordable options for high-volume image generation.
| Model | Provider | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| amazon-nova-2.0-omni | amazon | $0.20 | $1.30 | 64K |
| gemini-3.1-flash-image | fastrouter | $0.25 | $1.50 | 65K |
| gemini-2.5-flash-image | fastrouter | $0.30 | $2.50 | 32K |
| gemini-2.5-flash-image | auriko | $0.30 | $0.04 | 32K |
| gemini-3.1-flash-image | auriko | $0.50 | $3 | 65K |
| gpt-5-image-mini | openrouter | $2.50 | $2 | 400K |
Models that support both image generation and function/tool calling — ideal for AI agents that create images.
| Model | Provider | Input $/1M | Output $/1M | Context | Reasoning |
|---|---|---|---|---|---|
| amazon-nova-2.0-omni | amazon | $0.20 | $1.30 | 64K | ✅ |
| gemini-3-pro-image | llmgateway | $2 | $12 | — | |
| gemini-3.1-flash-image | llmgateway | $0.25 | $1.50 | — | |
| gemini-2.5-flash-image | llmgateway | $0.30 | $30 | — |
Models with 64K+ context for detailed image descriptions, multi-image generation, and long conversations.
| Model | Provider | Context | Input $/1M | Output $/1M |
|---|---|---|---|---|
| gpt-5-image | openrouter | 400K | $10 | $10 |
| gpt-5-image-mini | openrouter | 400K | $2.50 | $2 |
| gpt-5.4-image-2 | openrouter | 272K | $8 | $15 |
| gemini-3-pro-image | auriko | 131K | $2 | $12 |
| gemini-3.1-flash-image | fastrouter | 65K | $0.25 | $1.50 |
| gemini-3-pro-image | fastrouter | 65K | $2 | $12 |
| gemini-3.1-flash-image | auriko | 65K | $0.50 | $3 |
| amazon-nova-2.0-omni | amazon | 64K | $0.20 | $1.30 |
| Use Case | Recommended Model | Why |
|---|---|---|
| Art & creative | imagen-4.0-generate | Highest quality, Google's latest |
| Product images | dall-e-3 | Best prompt adherence, consistent style |
| Chat + images | gpt-5-image-mini | Conversational image gen, 400K context |
| AI agents | amazon-nova-2.0-omni | Tool calling + reasoning + image output |
| High volume / cheap | gemini-2.5-flash-image | Lowest cost per image |
| Image editing | step-image-edit-2 | Purpose-built for editing |
| Chinese content | step-2x-large | Best Chinese + English generation |
All data is sourced from first-party APIs. Models are identified by having
image in their modalities.output field. Dedicated image models
(DALL·E, Imagen) have no chat context. Chat models with image output support both text and
image generation in conversation. Aggregator providers are excluded from ranking tables.