LLM chat and AI completions with multiple providers

The Pro and Enterprise tiers ship with a unified LLM layer that lets you talk to multiple AI providers through a single set of endpoints. You can send multi-turn chat messages, stream token-by-token responses via Server-Sent Events, or generate one-shot text completions — all with the same request shape. Switching providers is a single field change in your request body. All AI endpoints live under /api/ai and are protected by authentication and rate limiting.

The AI and LLM features require the Pro or Enterprise tier. Requests from Basic tier accounts will be rejected with 403 Forbidden.

Sending a chat message

Send a POST request to /api/ai/chat with a messages array following the OpenAI-style role format. Choose your provider, model, and generation parameters. Set stream: false (the default) to receive the full response at once.

Request

POST /api/ai/chat
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is the capital of France?" }
  ],
  "provider": "openai",
  "model": "gpt-4o",
  "temperature": 0.7,
  "max_tokens": 500,
  "stream": false
}

Response — 200 OK

{
  "content": "The capital of France is Paris.",
  "model": "gpt-4o",
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 9,
    "total_tokens": 37
  }
}

The full ChatRequest schema:

Field	Type	Default	Description
`messages`	`array`	required	List of `{ role, content }` message objects.
`provider`	`string`	`"openai"`	AI provider: `openai`, `anthropic`, or `gemini`.
`model`	`string`	provider default	Model name (e.g. `gpt-4o`, `claude-3-5-sonnet-20241022`).
`temperature`	`float`	`0.7`	Sampling temperature between `0.0` and `2.0`.
`max_tokens`	`int`	`1000`	Maximum tokens to generate (1–16384).
`stream`	`bool`	`false`	Set to `true` to receive a streaming response.

Streaming responses

Set stream: true in your request to receive a text/event-stream response. Each event carries a single token. The stream ends with a [DONE] sentinel.

Streaming request

{
  "messages": [{ "role": "user", "content": "Tell me a joke." }],
  "provider": "openai",
  "model": "gpt-4o",
  "stream": true
}

Each chunk arrives as a Server-Sent Event:

data: {"token": "Why"}

data: {"token": " don"}

data: {"token": "'t"}

data: {"token": " scientists"}

data: [DONE]

Consume the stream in JavaScript using EventSource or the fetch API with a ReadableStream:

Consuming SSE in JavaScript

const response = await fetch("/api/ai/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${accessToken}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ messages, provider: "openai", stream: true }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (!line.startsWith("data: ")) continue;
    const payload = line.slice(6);
    if (payload === "[DONE]") break;
    const { token } = JSON.parse(payload);
    process.stdout.write(token);
  }
}

Text completions

For single-turn generation from a plain-text prompt, use POST /api/ai/completions. You can optionally provide a system_prompt to set context.

Request

POST /api/ai/completions
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "prompt": "Write a one-sentence summary of the Python programming language.",
  "system_prompt": "You write concise technical summaries.",
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "temperature": 0.3,
  "max_tokens": 100
}

Response — 200 OK

{
  "completion": "Python is a high-level, dynamically typed programming language known for its readable syntax and broad ecosystem of libraries.",
  "model": "claude-3-5-sonnet-20241022",
  "usage": {
    "input_tokens": 25,
    "output_tokens": 24
  }
}

Supported providers

Set the provider field in any request to switch between backends. The model name must be valid for the chosen provider.

OpenAI
Anthropic
Gemini

{
  "provider": "openai",
  "model": "gpt-4o",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Requires OPENAI_API_KEY in your environment. Supported models include gpt-4o, gpt-4o-mini, gpt-4-turbo, and gpt-3.5-turbo.

{
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Requires ANTHROPIC_API_KEY in your environment. Supported models include claude-3-5-sonnet-20241022, claude-3-opus-20240229, and claude-3-haiku-20240307.

{
  "provider": "gemini",
  "model": "gemini-1.5-pro",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Requires GOOGLE_API_KEY in your environment. Supported models include gemini-1.5-pro and gemini-1.5-flash.

​Sending a chat message

​Streaming responses

​Text completions

​Supported providers

Sending a chat message

Streaming responses

Text completions

Supported providers