Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt

Use this file to discover all available pages before exploring further.

Retrieval-Augmented Generation (RAG) is a technique that grounds LLM responses in your own documents. Instead of relying solely on what the model learned during training, it first searches your document collection for relevant passages, then passes those passages to the LLM as context. The result is more accurate, citation-backed answers that stay within your data’s scope. Shipfastai’s RAG pipeline handles document ingestion, chunking, embedding, semantic search, and augmented generation — all behind a simple REST API. RAG endpoints live under /api/rag.
The RAG pipeline requires the Pro or Enterprise tier. Basic tier accounts cannot access these endpoints.

Ingesting documents

Before you can query your documents, you need to ingest them into the vector store. Shipfastai supports two ingestion endpoints. Ingest plain text Send raw text content to POST /api/rag/ingest/text. The pipeline splits the text into overlapping chunks, embeds each chunk using OpenAI embeddings, and stores the result. Every chunk is tagged with your user_id for automatic isolation.
Request
POST /api/rag/ingest/text
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "content": "FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.8+ based on standard Python type hints. The key features are: fast, fast to code, fewer bugs, intuitive, easy, short, robust, and standards-based.",
  "metadata": { "source": "fastapi-overview", "category": "framework-docs" },
  "chunk_size": 1000,
  "chunk_overlap": 200
}
Response — 200 OK
{
  "document_ids": ["a3f1b2c4_0", "d9e8f7g6_1"],
  "chunks_created": 2
}
Ingest a file Upload a .txt, .pdf, or .docx file using a multipart POST /api/rag/ingest/file request. The pipeline extracts text from the file and then follows the same chunking and embedding process.
cURL example
curl -X POST /api/rag/ingest/file \
  -H "Authorization: Bearer <access_token>" \
  -F "file=@handbook.pdf" \
  -F "chunk_size=1000" \
  -F "chunk_overlap=200"
The ingestion parameters are:
FieldTypeDefaultDescription
contentstringrequiredRaw text to ingest (text endpoint only).
metadataobject{}Arbitrary key-value pairs attached to every chunk.
chunk_sizeint1000Maximum characters per chunk (100–10000).
chunk_overlapint200Characters of overlap between adjacent chunks (0–2000).
Use POST /api/rag/search to find document chunks that are semantically similar to a query string, without involving the LLM. This is useful for debugging your knowledge base or building custom retrieval logic.
Request
POST /api/rag/search
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "query": "What are the key features of FastAPI?",
  "top_k": 3,
  "filter": { "category": "framework-docs" }
}
Response — 200 OK
{
  "results": [
    {
      "id": "a3f1b2c4_0",
      "content": "FastAPI is a modern, fast (high-performance) web framework...",
      "score": 0.94,
      "metadata": {
        "source": "fastapi-overview",
        "category": "framework-docs",
        "chunk_index": 0,
        "total_chunks": 2,
        "user_id": "a1b2c3d4-0000-0000-0000-000000000001"
      }
    }
  ]
}
The filter field supports any metadata key-value pair you attached during ingestion. Results are automatically filtered to only include chunks belonging to your account.

RAG queries

Send a natural-language question to POST /api/rag/query. The pipeline embeds your question, retrieves the most relevant chunks, passes them to the LLM as context, and returns both the synthesized answer and the source documents used. Non-streaming query
Request
POST /api/rag/query
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "question": "What makes FastAPI fast?",
  "top_k": 5,
  "min_score": 0.5,
  "stream": false,
  "chat_history": [
    { "role": "user", "content": "Tell me about Python web frameworks." },
    { "role": "assistant", "content": "There are many Python web frameworks..." }
  ],
  "filter": { "category": "framework-docs" }
}
Response — 200 OK
{
  "answer": "FastAPI achieves high performance through its use of Starlette for the web parts and Pydantic for the data parts. It is one of the fastest Python frameworks available, on par with NodeJS and Go.",
  "sources": [
    {
      "id": "a3f1b2c4_0",
      "content": "FastAPI is a modern, fast (high-performance) web framework...",
      "score": 0.94,
      "metadata": { "source": "fastapi-overview" }
    }
  ],
  "usage": {
    "prompt_tokens": 312,
    "completion_tokens": 45,
    "total_tokens": 357
  }
}
Streaming query Set stream: true to receive the answer as a Server-Sent Event stream, identical in format to the chat streaming endpoint. Each event contains a { "token": "..." } payload, and the stream ends with data: [DONE]. The full RAGQueryRequest schema:
FieldTypeDefaultDescription
questionstringrequiredThe natural-language question to answer.
top_kint5Number of document chunks to retrieve (1–50).
min_scorefloat0.5Minimum similarity score to include a chunk (0.0–1.0).
streamboolfalseStream the answer token by token.
chat_historyarraynullPrior conversation turns to provide context.
filterobjectnullMetadata filter applied during retrieval.

Vector store options

The RAG pipeline uses a pluggable vector store backend. Configure which backend to use in your environment variables.
FAISS is the default vector store and requires no external service. It stores all vectors in memory and optionally persists them to disk. It is ideal for local development and small-to-medium datasets.
Environment
VECTOR_STORE_PROVIDER=faiss
FAISS_INDEX_PATH=./data/faiss.index  # optional persistence
No additional services are required. FAISS starts in-process alongside your FastAPI application.

Document isolation

Every document chunk is stored with a user_id metadata field automatically set to the ID of the authenticated user who ingested it. All search and query endpoints inject a user_id filter into every vector store query, so users can never retrieve each other’s documents — even if they use the same metadata keys. You do not need to add any user_id filter yourself; it is applied automatically. To delete a specific document chunk, call:
DELETE /api/rag/documents/{document_id}
Authorization: Bearer <access_token>