AI Providers

Connect to any OpenAI-compatible provider or run models fully offline.

Cloud providers

asyncat works with any OpenAI-compatible API. Configure in Settings → Models → Chat Provider inside the application.

ProviderBase URLExample model
OpenAIhttps://api.openai.com/v1gpt-4o, gpt-4o-mini
Anthropichttps://api.anthropic.com/v1claude-sonnet-4-6, claude-opus-4-7
Google Geminihttps://generativelanguage.googleapis.com/v1betagemini-1.5-pro
Groqhttps://api.groq.com/openai/v1llama-3.1-70b-versatile
DeepSeekhttps://api.deepseek.com/v1deepseek-chat, deepseek-coder
Together AIhttps://api.together.xyz/v1meta-llama/Llama-3-70b
Perplexityhttps://api.perplexity.aillama-3.1-sonar-large
Mistralhttps://api.mistral.ai/v1mistral-large-latest
Coherehttps://api.cohere.ai/v1command-r-plus
Fireworkshttps://api.fireworks.ai/inference/v1accounts/fireworks/models/llama-v3
Cerebrashttps://api.cerebras.ai/v1llama3.1-70b
OpenRouterhttps://openrouter.ai/api/v1any model on OpenRouter
NVIDIA NIMhttps://integrate.api.nvidia.com/v1meta/llama-3.1-70b-instruct
Azure OpenAIyour deployment URLgpt-4o (deployment name)
Amazon Bedrockvia OpenAI-compat proxyanthropic.claude-v3
Any OpenAI-compatyour endpointany model

Local engines

Run models entirely on your hardware — no API key, no internet, no data leaving your machine.

llama.cpp

asyncat can manage llama.cpp for you, or you can point it at an existing llama-server binary. Hardware auto-detection picks the right build:

  • Apple Silicon — Metal GPU acceleration (unified memory)
  • NVIDIA — CUDA 12.x, VRAM-aware layer offloading
  • AMD — ROCm GPU acceleration
  • CPU-only — works anywhere, slower

Install via Python (easiest):

pip install "llama-cpp-python[server]"

Ollama

Start Ollama and set the provider URL to http://localhost:11434/v1. Pull any model with ollama pull <model> — it appears automatically in asyncat's model list.

MLX (Apple Silicon)

MLX runs quantized models natively on Apple Silicon with excellent performance. Set the provider URL to your MLX server endpoint (typically http://localhost:8080/v1).

LM Studio

Enable the local server in LM Studio and point asyncat at http://localhost:1234/v1.

Configuration

All provider settings live in den/.env. The key variables:

VariableDescription
AI_BASE_URLProvider API base URL
AI_API_KEYAPI key (leave empty for local)
AI_MODELModel name (e.g. gpt-4o, claude-sonnet-4-6)
LOCAL_MODEL_PATHPath to a local GGUF model file
LLAMA_SERVER_PORTPort for llama-server (default: 8765)

Alternatively, you can edit the raw environment variables in the .env file under the den/ directory in the application workspace.

Switching providers

You can have multiple providers configured and switch between them in Settings → Models. asyncat stores all configured providers and lets you set one as active per session.