AI Models¶
Use the right tool for each job—fast models for quick tasks, smart ones for hard problems.
Most AI tools lock you into one model. Fabric lets you choose. Running a quick lint check? Use a fast, cheap model. Debugging a gnarly race condition? Switch to something powerful. You control the cost-quality tradeoff.
No More Overpaying
Stop using expensive models for simple tasks. Fabric's model selector puts you in control of your AI costs.
Model Tier System¶
Fabric automatically selects models based on task complexity using a three-tier system:
Tier Selection
Models are ranked by a formula that balances intelligence (SWE-bench score), speed (tokens/sec), and efficiency (latency and cost):
Large Tier - Most Capable¶
Used for complex reasoning, detailed planning, and architecture decisions.
Trade-off: Intelligence > Speed/Cost
| Model | Provider | SWE-bench | Cost/M Tokens |
|---|---|---|---|
| claude-opus-4-5 | Anthropic | 72 | $30 |
| gpt-5-pro | OpenAI | 70 | $135 |
| claude-sonnet-4-5 | Anthropic | 68 | $18 |
| gemini-3-pro-preview | 65 | $14 | |
| gpt-5.1 | OpenAI | 62 | $11.25 |
| gpt-5.1-codex | OpenAI | 60 | $11.25 |
| deepseek-reasoner | Deepseek | 55 | $2.74 |
| qwen3-235b-thinking | OpenRouter | 52 | $0.71 |
Medium Tier - Balanced¶
Used for targeted code changes, merging, and code review.
Trade-off: Balance of speed, cost, and capability
| Model | Provider | SWE-bench | Tokens/sec | Cost/M Tokens |
|---|---|---|---|---|
| claude-haiku-4-5 | Anthropic | 48 | 200 | $6 |
| gpt-5.1-codex-mini | OpenAI | 45 | 150 | $2.25 |
| zai-glm-4.6 | Cerebras | 42 | 2000 | $5 |
| qwen-3-235b-instruct | Cerebras | 40 | 1500 | $1.80 |
| deepseek-chat | Deepseek | 38 | 100 | $1.37 |
| qwen3-coder-30b | OpenRouter | 35 | 120 | $0.31 |
| grok-code-fast-1 | OpenRouter | 33 | 300 | $1.70 |
Small Tier - Fast & Cheap¶
Used for summaries, tab naming, and high-volume background operations.
Trade-off: Speed/Cost > Intelligence
| Model | Provider | SWE-bench | Tokens/sec | Cost/M Tokens |
|---|---|---|---|---|
| qwen-3-32b | Cerebras | 30 | 2500 | $1.20 |
| gpt-oss-120b | Cerebras | 28 | 2200 | $1.10 |
| gemini-2.5-flash | 32 | 500 | $0.75 | |
| gpt-5-nano | OpenAI | 25 | 300 | $0.45 |
| deepseek-r1:free | OpenRouter | 35 | 80 | Free |
| deepseek-v3.2-speciale | OpenRouter | 55 | 100 | $0.68 |
| local models | Local | 20 | 50 | Free |
Supported Providers¶
Anthropic¶
Claude models are known for thoughtful, nuanced responses and excellent code generation.
| Model | Display Name | Context | Output | Vision | Reasoning | Intelligence |
|---|---|---|---|---|---|---|
| claude-opus-4-5 | opus 4.5 | 200k | 32k | On/Off | ||
| claude-sonnet-4-5 | sonnet 4.5 | 200k | 64k | On/Off | ||
| claude-haiku-4-5 | haiku 4.5 | 200k | 64k | On/Off |
Capabilities
- Vision support (images, PDFs)
- Extended thinking mode
- Tool use
- JSON mode
- Web search
- :material-stream: Streaming responses
:simple-openai: OpenAI¶
GPT-5 series models with advanced reasoning capabilities.
| Model | Display Name | Context | Output | Vision | Reasoning | Intelligence |
|---|---|---|---|---|---|---|
| gpt-5-pro | gpt 5 pro | 400k | 128k | Effort Levels | ||
| gpt-5.1 | gpt 5.1 | 400k | 128k | Effort Levels | ||
| gpt-5.1-codex | gpt 5.1 codex | 400k | 128k | Effort Levels | ||
| gpt-5.1-codex-mini | gpt 5.1 codex mini | 400k | 128k | Effort Levels | ||
| gpt-5-nano | gpt 5 nano | 400k | 128k |
Reasoning Effort Levels
OpenAI models support configurable reasoning effort: low, medium, high
Intelligence ratings vary by effort level (e.g., gpt-5-pro: high=5★, medium=5★, low=4★)
Capabilities
- Vision support (images, PDFs)
- Extended thinking with effort control
- Tool use
- JSON mode
- Web search
- :material-stream: Streaming responses
Google¶
Gemini models with massive context windows and code execution.
| Model | Display Name | Context | Output | Vision | Reasoning | Intelligence |
|---|---|---|---|---|---|---|
| gemini-3-pro-preview | gemini 3.0 pro | 1M | 64k | On/Off | ||
| gemini-2.5-flash | gemini 2.5 flash | 1M | 65k |
Capabilities
- Vision support (images, PDFs)
- Extended thinking mode (gemini-3-pro only)
- Tool use
- JSON mode
- Web search
- Code execution (unique to Google)
- :material-stream: Streaming responses
:material-brain-circuit: Cerebras¶
Ultra-fast inference models optimized for code generation.
| Model | Display Name | Context | Output | Reasoning | Intelligence |
|---|---|---|---|---|---|
| zai-glm-4.6 | ZAI GLM 4.6 (preview) | 131k | 8k | ||
| gpt-oss-120b | GPT-OSS 120B | 131k | 8k | ||
| qwen-3-32b | Qwen 3 32B | 131k | 8k | On/Off | |
| qwen-3-235b-instruct | Qwen 3 235B Instruct (preview) | 131k | 8k | On/Off |
Performance
Cerebras models deliver 2000-2500 tokens/sec - the fastest in Fabric's lineup.
Capabilities
- Tool use
- :material-stream: Streaming responses
- Extended thinking (select models)
- No vision support
- No PDF support
Deepseek¶
Specialized reasoning models with excellent code understanding.
| Model | Display Name | Context | Output | Reasoning | Intelligence |
|---|---|---|---|---|---|
| deepseek-reasoner | deepseek reasoner | 128k | 64k | On/Off | |
| deepseek-chat | deepseek chat | 64k | 4k | On/Off |
Capabilities
- Extended thinking mode
- Tool use
- JSON mode
- :material-stream: Streaming responses
- No vision support
OpenRouter¶
Access to diverse open-source and commercial models through a single API.
| Model | Display Name | Context | Output | Reasoning | Intelligence |
|---|---|---|---|---|---|
| x-ai/grok-code-fast-1 | Grok Code Fast 1 | 256k | 128k | ||
| deepseek/deepseek-v3.2-speciale | DeepSeek V3.2 Speciale | 164k | 65k | On/Off | |
| qwen/qwen3-235b-thinking | Qwen3 235B Thinking | 262k | 81k | On/Off | |
| z-ai/glm-4.6 | GLM 4.6 | 202k | 100k | ||
| minimax/minimax-m2 | Minimax M2 | 204k | 100k | ||
| qwen/qwen3-coder-30b | Qwen3 Coder 30B | 262k | 128k | ||
| deepseek/deepseek-r1:free | DeepSeek R1 | 164k | 128k | On/Off |
Capabilities
- Tool use
- :material-stream: Streaming responses
- Extended thinking (select models)
- Free tier available (deepseek-r1)
Limitations
- No vision support (most models)
- No JSON mode
- No web search
Local Models¶
Run models locally using llama.cpp, Ollama, or custom endpoints.
| Feature | Status |
|---|---|
| Endpoint | http://localhost:8080/v1/ |
| Context | Model-dependent |
| Output | Model-dependent |
| Cost | Free |
| Intelligence |
Local Setup
Configure local providers through Settings → Models → Add Provider
Supported backends:
- llama.cpp - High-performance C++ inference
- Ollama - Easy model management
- Custom - Any OpenAI-compatible endpoint
Limitations
- No vision support
- No tool use
- No JSON mode
- No reasoning mode
- :material-stream: Streaming supported
Model Capabilities¶
Vision Support¶
Models that can process images and PDFs:
- All Anthropic models (Claude 4.5 series)
- All OpenAI models (GPT-5 series)
- All Google models (Gemini)
Supported Formats
- Images: PNG, JPEG, GIF, WebP
- Documents: PDF (automatic extraction)
Extended Thinking¶
Models with chain-of-thought reasoning:
Reasoning Types
- On/Off - Simple toggle (Anthropic, Google, Deepseek, select others)
- Effort Levels - Configurable low/medium/high (OpenAI)
Supported models:
- Anthropic: All Claude 4.5 models
- OpenAI: gpt-5-pro, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-mini
- Google: gemini-3-pro-preview
- Cerebras: qwen-3-32b, qwen-3-235b-instruct
- Deepseek: All models
- OpenRouter: deepseek-v3.2, qwen3-235b-thinking, deepseek-r1
Tool Use¶
All models except local models support tool/function calling for:
- Code execution
- File operations
- API integrations
- Custom tools
Web Search¶
Models with integrated web search capabilities:
- All Anthropic models
- All OpenAI models
- All Google models
Code Execution¶
Only Google Gemini models natively support code execution in the model environment.
Provider Configuration¶
Adding API Keys¶
Quick Setup
- Go to Settings → Models
- Select your provider
- Enter your API key
- Choose default models for each tier
Custom Endpoints¶
For self-hosted or custom model endpoints:
Fabric supports any OpenAI-compatible API endpoint.
Model Selection Strategy¶
Fabric automatically selects models based on the task:
| Task | Tier | Example Use Case |
|---|---|---|
| Planning | Large | Creating implementation plans, architecture decisions |
| Code Review | Large | Reviewing medium model output, complex refactoring |
| Implementation | Medium | Writing targeted code changes, merging edits |
| Quick Tasks | Small | Summaries, tab naming, background operations |
Manual Override
You can manually select any model regardless of tier from the model dropdown.
Cost Optimization¶
Budget-Friendly Options¶
Free models:
- OpenRouter: deepseek-r1:free
- Local: Any self-hosted model
Low-cost high-performers:
| Model | Provider | Cost/M | SWE-bench | Speed |
|---|---|---|---|---|
| qwen3-235b-thinking | OpenRouter | $0.71 | 52 | Moderate |
| gemini-2.5-flash | $0.75 | 32 | Fast | |
| deepseek-chat | Deepseek | $1.37 | 38 | Moderate |
| qwen-3-235b-instruct | Cerebras | $1.80 | 40 | Very Fast |
Performance Options¶
For maximum intelligence regardless of cost:
- claude-opus-4-5 - Best reasoning (SWE-bench: 72)
- gpt-5-pro - Strong all-around (SWE-bench: 70)
- claude-sonnet-4-5 - Excellent balance (SWE-bench: 68)
Frequently Asked Questions¶
Which model should I use?
For most coding tasks: Start with claude-sonnet-4-5 or gpt-5.1-codex (large tier)
For quick tasks: Use claude-haiku-4-5 or gemini-2.5-flash (small tier)
For complex architecture: Use claude-opus-4-5 or gpt-5-pro (large tier)
On a budget: Use deepseek-r1:free or local models
How does tier selection work?
Fabric ranks models within each tier by:
- Intelligence (SWE-bench score)
- Speed (tokens per second)
- Efficiency (latency and cost)
The highest-ranked available model is auto-selected for each task type.
Can I use multiple providers simultaneously?
Yes! Configure API keys for multiple providers and Fabric will use the best available model for each task based on tier rankings.
Do I need all API keys?
No - configure only the providers you want to use. At minimum, one provider with large, medium, and small tier models is recommended.
What's the difference between reasoning types?
- On/Off: Simple toggle for extended thinking (Anthropic, Google, Deepseek)
- Effort Levels: Adjustable low/medium/high reasoning intensity (OpenAI)
Higher effort = better quality but slower responses and higher costs.
Can I add custom models?
Yes! Use the custom provider option with any OpenAI-compatible endpoint. Local models (llama.cpp, Ollama) are also supported.