Retrieval-Augmented Generation (RAG) is a technique that grounds LLM responses in your own documents. Instead of relying solely on what the model learned during training, it first searches your document collection for relevant passages, then passes those passages to the LLM as context. The result is more accurate, citation-backed answers that stay within your data’s scope. Shipfastai’s RAG pipeline handles document ingestion, chunking, embedding, semantic search, and augmented generation — all behind a simple REST API. RAG endpoints live underDocumentation Index
Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt
Use this file to discover all available pages before exploring further.
/api/rag.
The RAG pipeline requires the Pro or Enterprise tier. Basic tier accounts cannot access these endpoints.
Ingesting documents
Before you can query your documents, you need to ingest them into the vector store. Shipfastai supports two ingestion endpoints. Ingest plain text Send raw text content toPOST /api/rag/ingest/text. The pipeline splits the text into overlapping chunks, embeds each chunk using OpenAI embeddings, and stores the result. Every chunk is tagged with your user_id for automatic isolation.
Request
Response — 200 OK
.txt, .pdf, or .docx file using a multipart POST /api/rag/ingest/file request. The pipeline extracts text from the file and then follows the same chunking and embedding process.
cURL example
| Field | Type | Default | Description |
|---|---|---|---|
content | string | required | Raw text to ingest (text endpoint only). |
metadata | object | {} | Arbitrary key-value pairs attached to every chunk. |
chunk_size | int | 1000 | Maximum characters per chunk (100–10000). |
chunk_overlap | int | 200 | Characters of overlap between adjacent chunks (0–2000). |
Semantic search
UsePOST /api/rag/search to find document chunks that are semantically similar to a query string, without involving the LLM. This is useful for debugging your knowledge base or building custom retrieval logic.
Request
Response — 200 OK
filter field supports any metadata key-value pair you attached during ingestion. Results are automatically filtered to only include chunks belonging to your account.
RAG queries
Send a natural-language question toPOST /api/rag/query. The pipeline embeds your question, retrieves the most relevant chunks, passes them to the LLM as context, and returns both the synthesized answer and the source documents used.
Non-streaming query
Request
Response — 200 OK
stream: true to receive the answer as a Server-Sent Event stream, identical in format to the chat streaming endpoint. Each event contains a { "token": "..." } payload, and the stream ends with data: [DONE].
The full RAGQueryRequest schema:
| Field | Type | Default | Description |
|---|---|---|---|
question | string | required | The natural-language question to answer. |
top_k | int | 5 | Number of document chunks to retrieve (1–50). |
min_score | float | 0.5 | Minimum similarity score to include a chunk (0.0–1.0). |
stream | bool | false | Stream the answer token by token. |
chat_history | array | null | Prior conversation turns to provide context. |
filter | object | null | Metadata filter applied during retrieval. |
Vector store options
The RAG pipeline uses a pluggable vector store backend. Configure which backend to use in your environment variables.- FAISS (default)
- Pinecone (managed cloud)
- Chroma (self-hosted)
FAISS is the default vector store and requires no external service. It stores all vectors in memory and optionally persists them to disk. It is ideal for local development and small-to-medium datasets.No additional services are required. FAISS starts in-process alongside your FastAPI application.
Environment
Document isolation
Every document chunk is stored with auser_id metadata field automatically set to the ID of the authenticated user who ingested it. All search and query endpoints inject a user_id filter into every vector store query, so users can never retrieve each other’s documents — even if they use the same metadata keys. You do not need to add any user_id filter yourself; it is applied automatically.
To delete a specific document chunk, call: