The AI Chat API provides two endpoints for interacting with large language models: a multi-turn chat endpoint that supports streaming, and a single-prompt completions endpoint. Both are available on the Pro and Enterprise tiers and are subject to rate limiting. All endpoints are mounted underDocumentation Index
Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt
Use this file to discover all available pages before exploring further.
/api/ai/.
These endpoints are available on Pro and Enterprise plans only. Requests from free-tier users will be rejected with a
403 response.POST /api/ai/chat
Send a conversation to the configured LLM provider and receive a response. You can choose the provider (openai, anthropic, or gemini) and optionally stream the response as server-sent events.
Headers:
Bearer <access_token>An ordered list of messages representing the conversation history. Each message must have a
role and content.The LLM provider to use. One of
"openai", "anthropic", or "gemini". The provider must be configured with a valid API key in your backend environment.The specific model to use (e.g.,
"gpt-4o", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"). If omitted, the provider’s default model is used.Sampling temperature between
0.0 and 2.0. Lower values produce more deterministic output; higher values increase creativity.Maximum number of tokens to generate. Must be between
1 and 16384.When
true, the response is streamed as server-sent events (SSE). Each event contains a token field with the next piece of text. The stream ends with data: [DONE].Non-streaming example
ChatResponse:
The full generated text response from the model.
The model identifier that was used to generate the response.
Token consumption breakdown for the request.
Streaming example
Set"stream": true to receive the response token by token as server-sent events. Each event is a JSON object with a token field. The final event is the literal string [DONE].
POST /api/ai/completions
Generate a single completion from a plain text prompt, without a conversation history. Useful for summarization, classification, code generation, and other single-turn tasks. Headers:Bearer <access_token>The user’s input prompt.
An optional system message that sets the model’s behavior for this request (e.g.,
"You are a JSON formatter.").The LLM provider to use. One of
"openai", "anthropic", or "gemini".The specific model to use. If omitted, the provider’s default model is used.
Sampling temperature between
0.0 and 2.0.Maximum number of tokens to generate. Must be between
1 and 16384.The generated text response.
The model identifier that produced the response.
Token consumption breakdown (same shape as
ChatResponse.usage).