aiproxy — OpenAI-compatible MCP gateway

Overview

MCP gives you a growing ecosystem of tool servers — web fetch, filesystem, databases, search, your own APIs. But wiring those tools into every app and every model is repetitive. aiproxy makes that infrastructure reusable and model-agnostic: define your MCP servers and LLM backends once, compose them into named assistants, and every OpenAI-compatible app in your stack gets a tool-augmented model for free — no client changes, no SDK lock-in.

flowchart LR
    client(["Any OpenAI client"])
    client -- "POST /v1/chat/completions" --> gate
    subgraph proxy["aiproxy"]
        direction TB
        gate["auth chain<br/>static keys | Apiman"]
        loop(["agent loop"])
        backend["backend adapter<br/>OpenAI-compat | native Anthropic"]
        mcp["MCP servers<br/>fetch | filesystem | http ..."]
        gate --> loop
        loop -- "LLM turn" --> backend
        backend -. "assistant / tool_calls" .-> loop
        loop -- "tool calls" --> mcp
        mcp -. "results" .-> loop
    end
    backend -- "chat / messages API" --> llm(["Upstream LLM"])
    proxy -- "OpenAI response" --> client

OpenAI-compatible

Streaming & non-streaming /v1/chat/completions + /v1/models. Works with the OpenAI SDKs, LangChain, LlamaIndex, curl.

Wraps any LLM

OpenAI-compatible backends (OpenAI, Groq, vLLM, Ollama, …) and native Anthropic, behind one interface.

Reusable MCP fabric

stdio, sse, streamable-http. Persistent sessions, namespaced tools, concurrent execution.

Runtime admin API

Add / edit / remove assistants, backends and MCP servers without a restart.

Pluggable auth

Static API keys and Apiman validation (gateway round-trip or trusted-header) run in parallel — see Auth.

Quick start

Run the prebuilt multi-arch image from the GitHub Container Registry:

# configure secrets + assistants
cp .env.example .env
cp config.example.yaml config.yaml

docker run --rm -p 8000:8000 --env-file .env \
  -v "$PWD/config.yaml:/app/config.yaml:ro" \
  ghcr.io/sirmmo/aiproxy:latest

Then talk to it exactly like OpenAI:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "research-assistant",
    "messages": [{"role":"user","content":"Summarize https://modelcontextprotocol.io"}]
  }'

…or with the OpenAI Python SDK — no code changes beyond the base URL:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused-or-PROXY_API_KEY")
resp = client.chat.completions.create(
    model="research-assistant",          # an assistant, not a raw model
    messages=[{"role": "user", "content": "What's on the MCP homepage?"}],
)
print(resp.choices[0].message.content)

Streaming works exactly as clients expect (stream=True): content tokens flow through while tool rounds run transparently between them.

Configuration

Everything is declared in config.yaml. ${VAR} / ${VAR:-default} are expanded from the environment, so keep secrets in .env.

mcp_servers:
  fetch:                                   # a reusable MCP server
    transport: stdio
    command: uvx
    args: ["mcp-server-fetch"]

backends:
  anthropic:                               # a wrapped LLM provider
    kind: anthropic                        # or "openai" for any compat endpoint
    base_url: https://api.anthropic.com/v1
    api_key: ${ANTHROPIC_API_KEY}

assistants:
  - name: research-assistant               # ← clients pass this as `model`
    backend: anthropic
    model: claude-sonnet-5
    system_prompt: "You are a meticulous research assistant. Cite your sources."
    mcp_servers: [fetch]
    max_tool_iterations: 8
    temperature: 0.2

Backends

kind	Talks to	Auth header
`openai`	Any OpenAI-compatible `/chat/completions` — OpenAI, Groq, Together, Mistral, vLLM, Ollama (`/v1`), LM Studio, OpenRouter…	`Authorization: Bearer`
`anthropic`	Native Anthropic `/messages`	`x-api-key`

The Anthropic backend translates the canonical chat messages ↔ the Messages API (system prompt, tool_use/tool_result blocks, streaming events, stop-reason mapping), so tool use works first-class with Claude.

MCP servers

transport	Fields
`stdio`	`command`, `args`, `env`, `cwd`
`sse`	`url`, `headers`
`http` / `streamable-http`	`url`, `headers`

Tools are exposed to the model as <server>__<tool> and routed back to the right server on call. Sessions are persistent (one subprocess per stdio server, reused across requests) and started lazily on first use. Node (npx) and uvx are baked into the image, so most community MCP servers install on demand.

Assistants

An assistant is a virtual model exposed via the OpenAI model field. It binds one backend, a system prompt, and a set of MCP servers, plus a tool-loop budget.

Field	Meaning
`name`	What clients pass as `model`
`backend`	Which configured backend to call
`model`	The upstream model id (e.g. `gpt-4o`, `claude-sonnet-5`)
`system_prompt`	Prepended if the request has no system message
`mcp_servers`	List of MCP servers whose tools are attached
`max_tool_iterations`	Tool-loop budget; the final turn drops tools to force an answer
`temperature`, `top_p`, `max_tokens`	Defaults; client-supplied params override them

OpenAI API

Method & path	Purpose
`GET /v1/models`	List configured assistants as OpenAI models
`POST /v1/chat/completions`	Chat completion; runs the MCP tool loop. Supports `stream`

How a request flows

Client posts to /v1/chat/completions with model: "<assistant>".
The gateway resolves the assistant → backend + MCP servers, and ensures those servers are connected.
It builds the OpenAI tool schema and enters the agent loop: call the LLM; if it requests tools, execute them concurrently against the MCP servers and feed results back; repeat until the model answers or max_tool_iterations is hit.
Returns a standard chat.completion (or streams chat.completion.chunks), with the assistant name as model and aggregated token usage.

Admin API

Mutate the live registry without restarting (set ADMIN_API_KEY to protect it):

# see what tools a server actually advertises
curl localhost:8000/admin/mcp/fetch/tools

# add / replace an assistant at runtime
curl -X PUT localhost:8000/admin/assistants/coder \
  -H "Content-Type: application/json" \
  -d '{"backend":"openai","model":"gpt-4o","mcp_servers":["filesystem"],
       "system_prompt":"You are a coding agent."}'

Method & path	Purpose
`GET /admin/config`	Dump current registry (secrets redacted)
`GET/PUT/DELETE /admin/assistants[/{name}]`	Manage assistants
`GET/PUT/DELETE /admin/backends[/{name}]`	Manage LLM backends
`GET/PUT/DELETE /admin/mcp[/{name}]`	Manage MCP servers
`GET /admin/mcp/{name}/tools`	Introspect a server's tools

Admin changes are in-memory. Use GET /admin/config to export current state and persist it into config.yaml yourself.

Auth

/v1/* auth is a pluggable chain — a request is authorized if any enabled provider accepts it, so static keys and Apiman run in parallel. The caller's key is read from Authorization: Bearer, the X-API-Key header, or the ?apikey= query param.

Provider	Enable with	Accepts when…
Static keys	non-empty `proxy_api_keys`	the key matches an entry
Apiman `gateway_probe`	`apiman.mode: gateway_probe`	the key validates via a round-trip through the Apiman gateway (2xx), cached
Apiman `trusted_header`	`apiman.mode: trusted_header`	the request carries the shared secret the gateway injects

Apiman — `gateway_probe`

aiproxy stays directly reachable and validates each caller's key against Apiman. Register a small "auth check" API whose backend is aiproxy's /health:

apiman:
  enabled: true
  mode: gateway_probe
  gateway_url: http://apiman-gateway:8080/apiman-gateway
  probe_api: aiproxy/authcheck/1.0    # {org}/{api}/{version}
  probe_path: health                  # backend path that returns 2xx
  cache_ttl: 60

Apiman — `trusted_header`

Put the Apiman gateway in front of aiproxy; an "Add Header" policy injects a shared secret aiproxy trusts (it never sees raw keys):

apiman:
  enabled: true
  mode: trusted_header
  header: X-Apiman-Gateway-Token
  secret: ${APIMAN_SHARED_SECRET}

/admin/* is separate: if ADMIN_API_KEY is set, admin calls must send Authorization: Bearer <ADMIN_API_KEY>. Everything is open when unset (handy for local dev; lock it down in production).

Development

The recommended path is Docker — a clean Python 3.12 with node/uvx available.

docker build -t aiproxy:latest .

# end-to-end check: spawns the demo MCP server and drives the full agent
# loop (streaming + non-streaming) with a scripted fake LLM. No API key needed:
docker run --rm aiproxy:latest python scripts/smoke_test.py

See the README for the full reference and a no-Docker (uv) workflow.

Turn any LLM into a more capable one.