Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt

Use this file to discover all available pages before exploring further.

The AI Chat API provides two endpoints for interacting with large language models: a multi-turn chat endpoint that supports streaming, and a single-prompt completions endpoint. Both are available on the Pro and Enterprise tiers and are subject to rate limiting. All endpoints are mounted under /api/ai/.
These endpoints are available on Pro and Enterprise plans only. Requests from free-tier users will be rejected with a 403 response.

POST /api/ai/chat

Send a conversation to the configured LLM provider and receive a response. You can choose the provider (openai, anthropic, or gemini) and optionally stream the response as server-sent events. Headers:
Authorization
string
required
Bearer <access_token>
Request body:
messages
object[]
required
An ordered list of messages representing the conversation history. Each message must have a role and content.
provider
string
default:"openai"
The LLM provider to use. One of "openai", "anthropic", or "gemini". The provider must be configured with a valid API key in your backend environment.
model
string
The specific model to use (e.g., "gpt-4o", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"). If omitted, the provider’s default model is used.
temperature
number
default:"0.7"
Sampling temperature between 0.0 and 2.0. Lower values produce more deterministic output; higher values increase creativity.
max_tokens
number
default:"1000"
Maximum number of tokens to generate. Must be between 1 and 16384.
stream
boolean
default:"false"
When true, the response is streamed as server-sent events (SSE). Each event contains a token field with the next piece of text. The stream ends with data: [DONE].

Non-streaming example

curl --request POST \
  --url http://localhost:8000/api/ai/chat \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --data '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between async and sync Python in one sentence."}
    ],
    "provider": "openai",
    "model": "gpt-4o",
    "temperature": 0.5,
    "max_tokens": 200
  }'
ResponseChatResponse:
content
string
required
The full generated text response from the model.
model
string
required
The model identifier that was used to generate the response.
usage
object
required
Token consumption breakdown for the request.
{
  "content": "Synchronous Python executes code line by line and blocks until each operation completes, while asynchronous Python uses `async`/`await` to pause and resume coroutines, allowing other tasks to run during waiting periods.",
  "model": "gpt-4o-2024-08-06",
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 42,
    "total_tokens": 80
  }
}

Streaming example

Set "stream": true to receive the response token by token as server-sent events. Each event is a JSON object with a token field. The final event is the literal string [DONE].
curl --request POST \
  --url http://localhost:8000/api/ai/chat \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --no-buffer \
  --data '{
    "messages": [{"role": "user", "content": "Count to five."}],
    "stream": true
  }'
SSE stream format:
data: {"token": "One"}

data: {"token": ","}

data: {"token": " two"}

data: {"token": ", three, four, five."}

data: [DONE]

POST /api/ai/completions

Generate a single completion from a plain text prompt, without a conversation history. Useful for summarization, classification, code generation, and other single-turn tasks. Headers:
Authorization
string
required
Bearer <access_token>
Request body:
prompt
string
required
The user’s input prompt.
system_prompt
string
An optional system message that sets the model’s behavior for this request (e.g., "You are a JSON formatter.").
provider
string
default:"openai"
The LLM provider to use. One of "openai", "anthropic", or "gemini".
model
string
The specific model to use. If omitted, the provider’s default model is used.
temperature
number
default:"0.7"
Sampling temperature between 0.0 and 2.0.
max_tokens
number
default:"1000"
Maximum number of tokens to generate. Must be between 1 and 16384.
curl --request POST \
  --url http://localhost:8000/api/ai/completions \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "Summarize the following in one sentence: FastAPI is a modern, fast web framework for building APIs with Python 3.7+ based on standard Python type hints.",
    "system_prompt": "You are a concise technical writer.",
    "provider": "openai",
    "max_tokens": 100
  }'
Response:
completion
string
required
The generated text response.
model
string
required
The model identifier that produced the response.
usage
object
required
Token consumption breakdown (same shape as ChatResponse.usage).
{
  "completion": "FastAPI is a high-performance Python web framework for building APIs using standard type hints.",
  "model": "gpt-4o-2024-08-06",
  "usage": {
    "prompt_tokens": 52,
    "completion_tokens": 17,
    "total_tokens": 69
  }
}