Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.shipfastai.dev/llms.txt

Use this file to discover all available pages before exploring further.

The RAG (Retrieval-Augmented Generation) API lets you build knowledge-grounded AI features. You ingest text or files into a per-user vector store, then search or query that store to get LLM answers backed by your own documents. All endpoints are available on the Pro and Enterprise tiers and are mounted under /api/rag/.
RAG endpoints are available on Pro and Enterprise plans only. The vector store is scoped per user — each user can only search and retrieve their own ingested documents.

POST /api/rag/ingest/text

Ingest raw text content into the vector store. The text is automatically split into overlapping chunks, embedded, and stored. Returns the generated document IDs and the number of chunks created. Headers:
Authorization
string
required
Bearer <access_token>
Request body:
content
string
required
The raw text to ingest. There is no enforced length limit, but very large documents will produce many chunks.
metadata
object
default:"{}"
Arbitrary key-value pairs attached to every chunk from this document. Useful for filtering later (e.g., {"source": "faq", "topic": "billing"}).
chunk_size
number
default:"1000"
Target character length of each chunk. Must be between 100 and 10000. The chunker attempts to break at sentence boundaries near this length.
chunk_overlap
number
default:"200"
Number of characters of overlap between adjacent chunks. Must be between 0 and 2000. Overlap improves recall by ensuring context is not lost at chunk boundaries.
import requests

response = requests.post(
    "http://localhost:8000/api/rag/ingest/text",
    headers={"Authorization": f"Bearer {access_token}"},
    json={
        "content": "Shipfastai is an AI-ready SaaS boilerplate for Python developers. It includes authentication, billing, and built-in RAG support out of the box.",
        "metadata": {"source": "product-docs", "topic": "overview"},
        "chunk_size": 500,
        "chunk_overlap": 100,
    },
)
print(response.json())
ResponseIngestResponse:
document_ids
string[]
required
List of IDs assigned to each stored chunk. Each ID is a 12-character MD5 hash prefix plus the chunk index (e.g., "a1b2c3d4e5f6_0").
chunks_created
number
required
Total number of chunks the text was split into and stored.
{
  "document_ids": [
    "a1b2c3d4e5f6_0",
    "b2c3d4e5f6a1_1"
  ],
  "chunks_created": 2
}

POST /api/rag/ingest/file

Ingest a file directly into the vector store. The file is parsed to plain text, then processed identically to /api/rag/ingest/text. Supported formats: .txt, .pdf, .docx. Headers:
Authorization
string
required
Bearer <access_token>
Form data (multipart/form-data):
file
file
required
The file to upload. Must be a .txt, .pdf, or .docx file.
chunk_size
number
default:"1000"
Target character length of each chunk. Must be between 100 and 10000.
chunk_overlap
number
default:"200"
Character overlap between adjacent chunks. Must be between 0 and 2000.
import requests

with open("documentation.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8000/api/rag/ingest/file",
        headers={"Authorization": f"Bearer {access_token}"},
        files={"file": ("documentation.pdf", f, "application/pdf")},
        data={"chunk_size": 800, "chunk_overlap": 150},
    )
print(response.json())
ResponseIngestResponse (same shape as /ingest/text):
{
  "document_ids": [
    "c3d4e5f6a1b2_0",
    "d4e5f6a1b2c3_1",
    "e5f6a1b2c3d4_2"
  ],
  "chunks_created": 3
}
The metadata for file ingests automatically includes {"source": "<filename>"} in addition to any user-supplied metadata.

POST /api/rag/search

Perform a pure semantic search over the vector store without involving an LLM. Returns the most relevant chunks ranked by similarity score. Useful for building your own retrieval logic or debugging what is in the store. Headers:
Authorization
string
required
Bearer <access_token>
Request body:
query
string
required
The search query. The query is embedded and compared against stored chunk embeddings.
top_k
number
default:"5"
Maximum number of results to return. Must be between 1 and 50.
filter
object
Optional metadata filter to narrow results. Key-value pairs are matched against chunk metadata (e.g., {"source": "product-docs"}). The user_id filter is applied automatically — you do not need to include it.
import requests

response = requests.post(
    "http://localhost:8000/api/rag/search",
    headers={"Authorization": f"Bearer {access_token}"},
    json={
        "query": "How does billing work?",
        "top_k": 3,
        "filter": {"topic": "billing"},
    },
)
for result in response.json()["results"]:
    print(f"[{result['score']:.2f}] {result['content'][:120]}")
ResponseSearchResponse:
results
object[]
required
Ordered list of matching chunks, most similar first.
{
  "results": [
    {
      "id": "a1b2c3d4e5f6_0",
      "content": "Shipfastai integrates with Stripe for subscription billing...",
      "score": 0.91,
      "metadata": {
        "source": "product-docs",
        "topic": "billing",
        "chunk_index": 0,
        "total_chunks": 2,
        "user_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
      }
    },
    {
      "id": "b2c3d4e5f6a1_1",
      "content": "You can manage your subscription through the Stripe customer portal...",
      "score": 0.84,
      "metadata": {
        "source": "product-docs",
        "topic": "billing",
        "chunk_index": 1,
        "total_chunks": 2,
        "user_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
      }
    }
  ]
}

POST /api/rag/query

Ask a natural language question. The API retrieves the most relevant chunks from the vector store and passes them to the LLM as context, returning a grounded answer along with the source chunks used. Supports streaming and optional conversation history for multi-turn sessions. Headers:
Authorization
string
required
Bearer <access_token>
Request body:
question
string
required
The natural language question to answer using the ingested documents.
top_k
number
default:"5"
Number of document chunks to retrieve as context. Must be between 1 and 50.
min_score
number
default:"0.5"
Minimum similarity score threshold between 0.0 and 1.0. Chunks scoring below this value are excluded from the context passed to the LLM.
stream
boolean
default:"false"
When true, the answer is streamed as server-sent events (SSE), using the same format as the AI Chat streaming endpoint.
chat_history
object[]
Optional conversation history for multi-turn queries. Each item must have a role ("user" or "assistant") and content. Providing history allows the model to resolve follow-up questions against prior context.
filter
object
Optional metadata filter applied during retrieval (e.g., {"source": "faq"}). The user_id filter is applied automatically.
import requests

response = requests.post(
    "http://localhost:8000/api/rag/query",
    headers={"Authorization": f"Bearer {access_token}"},
    json={
        "question": "How do I cancel my subscription?",
        "top_k": 4,
        "min_score": 0.6,
        "filter": {"source": "product-docs"},
    },
)
data = response.json()
print(data["answer"])
print(f"\nSources used: {len(data['sources'])}")
ResponseRAGQueryResponse:
answer
string
required
The LLM-generated answer, grounded in the retrieved document chunks.
sources
object[]
required
The document chunks that were retrieved and used as context. Same shape as SearchResponse.results.
usage
object
Token usage for the LLM call. May be null if the provider does not return usage data.
{
  "answer": "You can cancel your subscription at any time through the Stripe customer portal. Navigate to your account settings and click 'Manage Subscription' to open the portal, where you can cancel, downgrade, or update your payment method.",
  "sources": [
    {
      "id": "b2c3d4e5f6a1_1",
      "content": "You can manage your subscription through the Stripe customer portal...",
      "score": 0.88,
      "metadata": {
        "source": "product-docs",
        "chunk_index": 1,
        "total_chunks": 2,
        "user_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 312,
    "completion_tokens": 58,
    "total_tokens": 370
  }
}

Multi-turn query example

Use chat_history to maintain context across follow-up questions:
import requests

history = []

def ask(question: str) -> str:
    response = requests.post(
        "http://localhost:8000/api/rag/query",
        headers={"Authorization": f"Bearer {access_token}"},
        json={"question": question, "chat_history": history},
    )
    data = response.json()
    history.append({"role": "user", "content": question})
    history.append({"role": "assistant", "content": data["answer"]})
    return data["answer"]

print(ask("What is included in the Pro plan?"))
print(ask("And how much does it cost?"))  # resolved against prior context

DELETE /api/rag/documents/

Delete a specific document chunk from the vector store by its ID.
document_id
string
required
The document chunk ID to delete, as returned by the ingest endpoints.
curl --request DELETE \
  --url http://localhost:8000/api/rag/documents/a1b2c3d4e5f6_0 \
  --header "Authorization: Bearer <access_token>"
Response:
{
  "status": "deleted",
  "document_id": "a1b2c3d4e5f6_0"
}