AI Providers

Connect to any OpenAI-compatible provider or run models fully offline.

Cloud providers

asyncat works with any OpenAI-compatible API. Configure in Settings → Models → Chat Provider inside the application.

Provider	Base URL	Example model
OpenAI	`https://api.openai.com/v1`	gpt-4o, gpt-4o-mini
Anthropic	`https://api.anthropic.com/v1`	claude-sonnet-4-6, claude-opus-4-7
Google Gemini	`https://generativelanguage.googleapis.com/v1beta`	gemini-1.5-pro
Groq	`https://api.groq.com/openai/v1`	llama-3.1-70b-versatile
DeepSeek	`https://api.deepseek.com/v1`	deepseek-chat, deepseek-coder
Together AI	`https://api.together.xyz/v1`	meta-llama/Llama-3-70b
Perplexity	`https://api.perplexity.ai`	llama-3.1-sonar-large
Mistral	`https://api.mistral.ai/v1`	mistral-large-latest
Cohere	`https://api.cohere.ai/v1`	command-r-plus
Fireworks	`https://api.fireworks.ai/inference/v1`	accounts/fireworks/models/llama-v3
Cerebras	`https://api.cerebras.ai/v1`	llama3.1-70b
OpenRouter	`https://openrouter.ai/api/v1`	any model on OpenRouter
NVIDIA NIM	`https://integrate.api.nvidia.com/v1`	meta/llama-3.1-70b-instruct
Azure OpenAI	your deployment URL	gpt-4o (deployment name)
Amazon Bedrock	via OpenAI-compat proxy	anthropic.claude-v3
Any OpenAI-compat	your endpoint	any model

Local engines

Run models entirely on your hardware — no API key, no internet, no data leaving your machine.

llama.cpp

asyncat can manage llama.cpp for you, or you can point it at an existing llama-server binary. Hardware auto-detection picks the right build:

Apple Silicon — Metal GPU acceleration (unified memory)
NVIDIA — CUDA 12.x, VRAM-aware layer offloading
AMD — ROCm GPU acceleration
CPU-only — works anywhere, slower

Install via Python (easiest):

pip install "llama-cpp-python[server]"

Ollama

Start Ollama and set the provider URL to http://localhost:11434/v1. Pull any model with ollama pull <model> — it appears automatically in asyncat's model list.

MLX (Apple Silicon)

MLX runs quantized models natively on Apple Silicon with excellent performance. Set the provider URL to your MLX server endpoint (typically http://localhost:8080/v1).

LM Studio

Enable the local server in LM Studio and point asyncat at http://localhost:1234/v1.

Configuration

All provider settings live in den/.env. The key variables:

Variable	Description
`AI_BASE_URL`	Provider API base URL
`AI_API_KEY`	API key (leave empty for local)
`AI_MODEL`	Model name (e.g. `gpt-4o`, `claude-sonnet-4-6`)
`LOCAL_MODEL_PATH`	Path to a local GGUF model file
`LLAMA_SERVER_PORT`	Port for llama-server (default: 8765)

Alternatively, you can edit the raw environment variables in the .env file under the den/ directory in the application workspace.

Switching providers

You can have multiple providers configured and switch between them in Settings → Models. asyncat stores all configured providers and lets you set one as active per session.