aiproxy

Turn any LLM into a more capable one.

aiproxy is an OpenAI-compatible gateway that fuses a wrapped LLM with a reusable fabric of Model Context Protocol servers. Point any OpenAI client at it, pick an assistant as the model, and the gateway runs the whole agentic tool loop for you — returning a normal OpenAI response.

Get started → ★ Star on GitHub ghcr.io image
ci publish license

Overview

MCP gives you a growing ecosystem of tool servers — web fetch, filesystem, databases, search, your own APIs. But wiring those tools into every app and every model is repetitive. aiproxy makes that infrastructure reusable and model-agnostic: define your MCP servers and LLM backends once, compose them into named assistants, and every OpenAI-compatible app in your stack gets a tool-augmented model for free — no client changes, no SDK lock-in.

flowchart LR
    client(["Any OpenAI client"])
    client -- "POST /v1/chat/completions" --> gate
    subgraph proxy["aiproxy"]
        direction TB
        gate["auth chain<br/>static keys | Apiman"]
        loop(["agent loop"])
        backend["backend adapter<br/>OpenAI-compat | native Anthropic"]
        mcp["MCP servers<br/>fetch | filesystem | http ..."]
        gate --> loop
        loop -- "LLM turn" --> backend
        backend -. "assistant / tool_calls" .-> loop
        loop -- "tool calls" --> mcp
        mcp -. "results" .-> loop
    end
    backend -- "chat / messages API" --> llm(["Upstream LLM"])
    proxy -- "OpenAI response" --> client
OpenAI-compatible

Streaming & non-streaming /v1/chat/completions + /v1/models. Works with the OpenAI SDKs, LangChain, LlamaIndex, curl.

Wraps any LLM

OpenAI-compatible backends (OpenAI, Groq, vLLM, Ollama, …) and native Anthropic, behind one interface.

Reusable MCP fabric

stdio, sse, streamable-http. Persistent sessions, namespaced tools, concurrent execution.

Runtime admin API

Add / edit / remove assistants, backends and MCP servers without a restart.

Pluggable auth

Static API keys and Apiman validation (gateway round-trip or trusted-header) run in parallel — see Auth.

Quick start

Run the prebuilt multi-arch image from the GitHub Container Registry:

# configure secrets + assistants
cp .env.example .env
cp config.example.yaml config.yaml

docker run --rm -p 8000:8000 --env-file .env \
  -v "$PWD/config.yaml:/app/config.yaml:ro" \
  ghcr.io/sirmmo/aiproxy:latest

Then talk to it exactly like OpenAI:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "research-assistant",
    "messages": [{"role":"user","content":"Summarize https://modelcontextprotocol.io"}]
  }'

…or with the OpenAI Python SDK — no code changes beyond the base URL:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused-or-PROXY_API_KEY")
resp = client.chat.completions.create(
    model="research-assistant",          # an assistant, not a raw model
    messages=[{"role": "user", "content": "What's on the MCP homepage?"}],
)
print(resp.choices[0].message.content)
Streaming works exactly as clients expect (stream=True): content tokens flow through while tool rounds run transparently between them.

Configuration

Everything is declared in config.yaml. ${VAR} / ${VAR:-default} are expanded from the environment, so keep secrets in .env.

mcp_servers:
  fetch:                                   # a reusable MCP server
    transport: stdio
    command: uvx
    args: ["mcp-server-fetch"]

backends:
  anthropic:                               # a wrapped LLM provider
    kind: anthropic                        # or "openai" for any compat endpoint
    base_url: https://api.anthropic.com/v1
    api_key: ${ANTHROPIC_API_KEY}

assistants:
  - name: research-assistant               # ← clients pass this as `model`
    backend: anthropic
    model: claude-sonnet-5
    system_prompt: "You are a meticulous research assistant. Cite your sources."
    mcp_servers: [fetch]
    max_tool_iterations: 8
    temperature: 0.2

Backends

kindTalks toAuth header
openaiAny OpenAI-compatible /chat/completions — OpenAI, Groq, Together, Mistral, vLLM, Ollama (/v1), LM Studio, OpenRouter…Authorization: Bearer
anthropicNative Anthropic /messagesx-api-key

The Anthropic backend translates the canonical chat messages ↔ the Messages API (system prompt, tool_use/tool_result blocks, streaming events, stop-reason mapping), so tool use works first-class with Claude.

MCP servers

transportFields
stdiocommand, args, env, cwd
sseurl, headers
http / streamable-httpurl, headers

Tools are exposed to the model as <server>__<tool> and routed back to the right server on call. Sessions are persistent (one subprocess per stdio server, reused across requests) and started lazily on first use. Node (npx) and uvx are baked into the image, so most community MCP servers install on demand.

Assistants

An assistant is a virtual model exposed via the OpenAI model field. It binds one backend, a system prompt, and a set of MCP servers, plus a tool-loop budget.

FieldMeaning
nameWhat clients pass as model
backendWhich configured backend to call
modelThe upstream model id (e.g. gpt-4o, claude-sonnet-5)
system_promptPrepended if the request has no system message
mcp_serversList of MCP servers whose tools are attached
max_tool_iterationsTool-loop budget; the final turn drops tools to force an answer
temperature, top_p, max_tokensDefaults; client-supplied params override them

OpenAI API

Method & pathPurpose
GET /v1/modelsList configured assistants as OpenAI models
POST /v1/chat/completionsChat completion; runs the MCP tool loop. Supports stream

How a request flows

  • Client posts to /v1/chat/completions with model: "<assistant>".
  • The gateway resolves the assistant → backend + MCP servers, and ensures those servers are connected.
  • It builds the OpenAI tool schema and enters the agent loop: call the LLM; if it requests tools, execute them concurrently against the MCP servers and feed results back; repeat until the model answers or max_tool_iterations is hit.
  • Returns a standard chat.completion (or streams chat.completion.chunks), with the assistant name as model and aggregated token usage.

Admin API

Mutate the live registry without restarting (set ADMIN_API_KEY to protect it):

# see what tools a server actually advertises
curl localhost:8000/admin/mcp/fetch/tools

# add / replace an assistant at runtime
curl -X PUT localhost:8000/admin/assistants/coder \
  -H "Content-Type: application/json" \
  -d '{"backend":"openai","model":"gpt-4o","mcp_servers":["filesystem"],
       "system_prompt":"You are a coding agent."}'
Method & pathPurpose
GET /admin/configDump current registry (secrets redacted)
GET/PUT/DELETE /admin/assistants[/{name}]Manage assistants
GET/PUT/DELETE /admin/backends[/{name}]Manage LLM backends
GET/PUT/DELETE /admin/mcp[/{name}]Manage MCP servers
GET /admin/mcp/{name}/toolsIntrospect a server's tools
Admin changes are in-memory. Use GET /admin/config to export current state and persist it into config.yaml yourself.

Auth

/v1/* auth is a pluggable chain — a request is authorized if any enabled provider accepts it, so static keys and Apiman run in parallel. The caller's key is read from Authorization: Bearer, the X-API-Key header, or the ?apikey= query param.

ProviderEnable withAccepts when…
Static keysnon-empty proxy_api_keysthe key matches an entry
Apiman gateway_probeapiman.mode: gateway_probethe key validates via a round-trip through the Apiman gateway (2xx), cached
Apiman trusted_headerapiman.mode: trusted_headerthe request carries the shared secret the gateway injects

Apiman — gateway_probe

aiproxy stays directly reachable and validates each caller's key against Apiman. Register a small "auth check" API whose backend is aiproxy's /health:

apiman:
  enabled: true
  mode: gateway_probe
  gateway_url: http://apiman-gateway:8080/apiman-gateway
  probe_api: aiproxy/authcheck/1.0    # {org}/{api}/{version}
  probe_path: health                  # backend path that returns 2xx
  cache_ttl: 60

Apiman — trusted_header

Put the Apiman gateway in front of aiproxy; an "Add Header" policy injects a shared secret aiproxy trusts (it never sees raw keys):

apiman:
  enabled: true
  mode: trusted_header
  header: X-Apiman-Gateway-Token
  secret: ${APIMAN_SHARED_SECRET}

/admin/* is separate: if ADMIN_API_KEY is set, admin calls must send Authorization: Bearer <ADMIN_API_KEY>. Everything is open when unset (handy for local dev; lock it down in production).

Development

The recommended path is Docker — a clean Python 3.12 with node/uvx available.

docker build -t aiproxy:latest .

# end-to-end check: spawns the demo MCP server and drives the full agent
# loop (streaming + non-streaming) with a scripted fake LLM. No API key needed:
docker run --rm aiproxy:latest python scripts/smoke_test.py

See the README for the full reference and a no-Docker (uv) workflow.