The RAG (Retrieval-Augmented Generation) API lets you build knowledge-grounded AI features. You ingest text or files into a per-user vector store, then search or query that store to get LLM answers backed by your own documents. All endpoints are available on the Pro and Enterprise tiers and are mounted underDocumentation Index
Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt
Use this file to discover all available pages before exploring further.
/api/rag/.
RAG endpoints are available on Pro and Enterprise plans only. The vector store is scoped per user — each user can only search and retrieve their own ingested documents.
POST /api/rag/ingest/text
Ingest raw text content into the vector store. The text is automatically split into overlapping chunks, embedded, and stored. Returns the generated document IDs and the number of chunks created. Headers:Bearer <access_token>The raw text to ingest. There is no enforced length limit, but very large documents will produce many chunks.
Arbitrary key-value pairs attached to every chunk from this document. Useful for filtering later (e.g.,
{"source": "faq", "topic": "billing"}).Target character length of each chunk. Must be between
100 and 10000. The chunker attempts to break at sentence boundaries near this length.Number of characters of overlap between adjacent chunks. Must be between
0 and 2000. Overlap improves recall by ensuring context is not lost at chunk boundaries.IngestResponse:
List of IDs assigned to each stored chunk. Each ID is a 12-character MD5 hash prefix plus the chunk index (e.g.,
"a1b2c3d4e5f6_0").Total number of chunks the text was split into and stored.
POST /api/rag/ingest/file
Ingest a file directly into the vector store. The file is parsed to plain text, then processed identically to/api/rag/ingest/text. Supported formats: .txt, .pdf, .docx.
Headers:
Bearer <access_token>The file to upload. Must be a
.txt, .pdf, or .docx file.Target character length of each chunk. Must be between
100 and 10000.Character overlap between adjacent chunks. Must be between
0 and 2000.IngestResponse (same shape as /ingest/text):
metadata for file ingests automatically includes {"source": "<filename>"} in addition to any user-supplied metadata.
POST /api/rag/search
Perform a pure semantic search over the vector store without involving an LLM. Returns the most relevant chunks ranked by similarity score. Useful for building your own retrieval logic or debugging what is in the store. Headers:Bearer <access_token>The search query. The query is embedded and compared against stored chunk embeddings.
Maximum number of results to return. Must be between
1 and 50.Optional metadata filter to narrow results. Key-value pairs are matched against chunk metadata (e.g.,
{"source": "product-docs"}). The user_id filter is applied automatically — you do not need to include it.SearchResponse:
Ordered list of matching chunks, most similar first.
POST /api/rag/query
Ask a natural language question. The API retrieves the most relevant chunks from the vector store and passes them to the LLM as context, returning a grounded answer along with the source chunks used. Supports streaming and optional conversation history for multi-turn sessions. Headers:Bearer <access_token>The natural language question to answer using the ingested documents.
Number of document chunks to retrieve as context. Must be between
1 and 50.Minimum similarity score threshold between
0.0 and 1.0. Chunks scoring below this value are excluded from the context passed to the LLM.When
true, the answer is streamed as server-sent events (SSE), using the same format as the AI Chat streaming endpoint.Optional conversation history for multi-turn queries. Each item must have a
role ("user" or "assistant") and content. Providing history allows the model to resolve follow-up questions against prior context.Optional metadata filter applied during retrieval (e.g.,
{"source": "faq"}). The user_id filter is applied automatically.RAGQueryResponse:
The LLM-generated answer, grounded in the retrieved document chunks.
The document chunks that were retrieved and used as context. Same shape as
SearchResponse.results.Token usage for the LLM call. May be
null if the provider does not return usage data.Multi-turn query example
Usechat_history to maintain context across follow-up questions:
DELETE /api/rag/documents/
Delete a specific document chunk from the vector store by its ID.The document chunk ID to delete, as returned by the ingest endpoints.