AI Providers
Connect to any OpenAI-compatible provider or run models fully offline.
Cloud providers
asyncat works with any OpenAI-compatible API. Configure in Settings → Models → Chat Provider inside the application.
| Provider | Base URL | Example model |
|---|---|---|
| OpenAI | https://api.openai.com/v1 | gpt-4o, gpt-4o-mini |
| Anthropic | https://api.anthropic.com/v1 | claude-sonnet-4-6, claude-opus-4-7 |
| Google Gemini | https://generativelanguage.googleapis.com/v1beta | gemini-1.5-pro |
| Groq | https://api.groq.com/openai/v1 | llama-3.1-70b-versatile |
| DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, deepseek-coder |
| Together AI | https://api.together.xyz/v1 | meta-llama/Llama-3-70b |
| Perplexity | https://api.perplexity.ai | llama-3.1-sonar-large |
| Mistral | https://api.mistral.ai/v1 | mistral-large-latest |
| Cohere | https://api.cohere.ai/v1 | command-r-plus |
| Fireworks | https://api.fireworks.ai/inference/v1 | accounts/fireworks/models/llama-v3 |
| Cerebras | https://api.cerebras.ai/v1 | llama3.1-70b |
| OpenRouter | https://openrouter.ai/api/v1 | any model on OpenRouter |
| NVIDIA NIM | https://integrate.api.nvidia.com/v1 | meta/llama-3.1-70b-instruct |
| Azure OpenAI | your deployment URL | gpt-4o (deployment name) |
| Amazon Bedrock | via OpenAI-compat proxy | anthropic.claude-v3 |
| Any OpenAI-compat | your endpoint | any model |
Local engines
Run models entirely on your hardware — no API key, no internet, no data leaving your machine.
llama.cpp
asyncat can manage llama.cpp for you, or you can point it at an existing llama-server binary. Hardware auto-detection picks the right build:
- Apple Silicon — Metal GPU acceleration (unified memory)
- NVIDIA — CUDA 12.x, VRAM-aware layer offloading
- AMD — ROCm GPU acceleration
- CPU-only — works anywhere, slower
Install via Python (easiest):
pip install "llama-cpp-python[server]" Ollama
Start Ollama and set the provider URL to http://localhost:11434/v1. Pull any model with ollama pull <model> — it appears automatically in asyncat's model list.
MLX (Apple Silicon)
MLX runs quantized models natively on Apple Silicon with excellent performance. Set the provider URL to your MLX server endpoint (typically http://localhost:8080/v1).
LM Studio
Enable the local server in LM Studio and point asyncat at http://localhost:1234/v1.
Configuration
All provider settings live in den/.env. The key variables:
| Variable | Description |
|---|---|
AI_BASE_URL | Provider API base URL |
AI_API_KEY | API key (leave empty for local) |
AI_MODEL | Model name (e.g. gpt-4o, claude-sonnet-4-6) |
LOCAL_MODEL_PATH | Path to a local GGUF model file |
LLAMA_SERVER_PORT | Port for llama-server (default: 8765) |
Alternatively, you can edit the raw environment variables in the .env file under the den/ directory in the application workspace.
Switching providers
You can have multiple providers configured and switch between them in Settings → Models. asyncat stores all configured providers and lets you set one as active per session.