Transitioning to enterprise software. Services live now. First product, RAG Studio, ships Q4 2026. See the roadmap →
Service · Cluster 1 · AI Dev Platform

Your documents. Searchable. Cited. POPIA-safe.

A focused engagement that turns your scattered knowledge — Confluence, SharePoint, PDFs, contracts, runbooks, policy decks — into a private retrieval-augmented generation system that answers questions in plain language with verifiable citations back to the source. Four to six weeks. Built on the same RAG architecture that powers sonofgraig's enterprise platform.

Source-cited answers. Every response links back to the exact document and page it drew from.
PII scrubbed before embedding. SA personal-information patterns redacted at ingestion — tested in CI.
Hybrid retrieval. Dense semantic search plus keyword search for higher accuracy than either alone.
Hosted in AWS af-south-1. Vector store, document storage, and embeddings — all in Cape Town.
POPIA-native
Fixed scope, fixed price
30-day post-delivery support
Convert to subscription on close
Starting at
R55,000
Delivery window
4–6 weeks
Documents included
Up to 5,000
Default region
AWS af-south-1
01 · Outcomes

What ships at the end of week six.

A focused RAG engagement only succeeds if the retrieval is good. Most failed enterprise RAG projects skip evaluation, ingest the wrong documents, or never run the PII scrubber. We tackle all three by making them deliverables — not afterthoughts.

Outcome 01
A searchable, citation-backed knowledge base
Your documents indexed in a per-tenant Qdrant collection inside af-south-1. Hybrid retrieval combines semantic and keyword search. Every answer cites the exact source document and page it drew from — no hallucinated references.
~3 sec typical end-to-end query latency
Outcome 02
A POPIA-compliant ingestion pipeline
PII detection and redaction runs synchronously before any document chunk is embedded. The vector store, document storage, and embedding model all sit in af-south-1. Every query is logged immutably with hashes — the audit log is a compliance record, not a copy of your data.
0 raw PII transmitted to embedding APIs
Outcome 03
Measured retrieval quality, not assumed
Ragas runs over a curated test corpus before go-live and on every change to the index. You receive numbers for faithfulness, answer relevancy, context precision, and context recall — with thresholds agreed during scoping. No black box.
4 Ragas metrics tracked over time
02 · The pipeline

Six stages from raw doc to cited answer.

The diagram below traces a single document from your source system to a query response. Each stage adds either value (chunking, embedding, retrieval) or a guarantee (PII scrubbing, audit logging). Nothing in the pipeline is opaque — every stage logs metrics that show up in your eval dashboard.

Stage 01
Source connect
Airbyte syncs from Drive, Confluence, SharePoint, S3, Notion, and more on schedule.
Airbyte
Stage 02
Parse & classify
Unstructured.io extracts text, tables, and structure from PDF, Word, PowerPoint, scans.
Unstructured.io
Stage 03
PII scrub
SA personal-info patterns (ID, mobile, email, passport) detected and redacted synchronously.
sonofgraig
Stage 04
Chunk & embed
Recursive, sentence, or semantic chunking. Embedding via OpenAI or local model in-region.
LlamaIndex
Stage 05
Index
Per-tenant Qdrant collection: {org_id}_{kb_id}. AES-256 at rest.
Qdrant
Stage 06
Retrieve & cite
Hybrid search returns ranked chunks. The LLM answers with paragraph-level citations.
FastAPI
03 · Source connectors

Connect once. Sync forever.

Source ingestion runs on Airbyte — the same open-source data integration platform our enterprise platform uses. Set up the connection once, choose a sync schedule, and the index stays fresh. The standard engagement covers up to four source connections; additional sources are quoted at R8K each.

Google Drive
Docs · Sheets · files
Confluence
Spaces · pages
SharePoint
M365 libraries
OneDrive
M365 personal
Notion
DBs · pages
GitHub
README · wiki · md
Amazon S3
Buckets · MinIO
Dropbox Business
Folders · files
Slack
Public channels
PDF / Word
Direct upload
REST API
Custom connector
Postgres / MySQL
Tables · views

Standard engagement: up to 4 source connections. Additional sources +R8,000 each. Custom REST connectors with non-trivial pagination or auth flows are scoped per case.

04 · How retrieval works

Source-cited answers, not hallucinated narratives.

A working RAG system is judged by its citations as much as its answers. The mock query below shows what a typical response looks like in your knowledge base — an answer in plain language, three retrieved chunks ranked by similarity, and links back to the source documents. The same shape ships in the embeddable chat widget.

Query · sample
What is our refund policy for enterprise customers who cancel during the first 30 days?
Retrieval modeHybrid (semantic + BM25)
top_k5 chunks
Embedding modeltext-embedding-3-small
LLMClaude Sonnet
Round-trip latency2.8 s
Faithfulness score0.94
Generated answer · cited
Enterprise customers who cancel within the first 30 calendar days of contract execution receive a full refund of the implementation fee, less any infrastructure pass-through costs already incurred[1]. Refunds are processed within 14 business days of the cancellation request being received in writing[2]. Cancellation after day 30 follows the standard subscription terms, which apply pro-rata against the remaining contract term[3].
Retrieved chunks · ranked by similarity
1
Enterprise_Master_Agreement_v3.pdf · §8.2 · p.14
"Within the first thirty (30) calendar days of contract execution, the Customer may terminate this Agreement and receive a full refund of the implementation fee paid, less infrastructure pass-through costs already incurred by sonofgraig."
0.91
2
Refund_Operations_Runbook.md · § Processing timeline
"All approved refund requests are settled by EFT within 14 business days of the cancellation acknowledgement letter being sent to the Customer's registered representative."
0.86
3
Subscription_Terms.docx · §4 · Cancellation after the trial window
"After day 30, cancellation follows the standard subscription terms. Charges already invoiced for the remaining contract term are recoverable on a pro-rata basis only at sonofgraig's discretion."
0.78
05 · Delivery cadence

Four phases. Four to six weeks.

Every phase ships a tangible deliverable. Nothing is left for "later". The cadence below is the standard plan; if your document set is unusually large or your source systems require new connectors, the build phase can extend by up to a week.

Discovery, classification & data audit
Document inventory · Sensitivity classification · POPIA risk
Week 1
A workshop with your operating team and your Information Officer to map the documents in scope, classify them by sensitivity, identify the POPIA Section 11 lawful basis for each source, and confirm whose access is allowed to which sub-collection. The scoping document is signed before any code is written.
Deliverables
Document inventory with classification and processing-purpose statement per source
Access control matrix — user groups mapped to document groups
Test query set — 30–50 representative questions with gold answers
Ingestion, PII scrubbing & chunking
Airbyte connectors · Unstructured · LlamaIndex chunking
Weeks 2–3
Source connectors are configured and tested. Documents flow through Unstructured.io for parsing, then through the SA-specific PII scrubber, then into the chunking strategy chosen for your content type (recursive for long-form, sentence for legal text, semantic for marketing collateral). Embedding is performed with the model best suited to your accuracy & cost trade-off.
Deliverables
Configured Airbyte sources with sync schedules and health monitoring
PII scrubbing test report — ID nos, mobile, email, passport patterns
Chunked, embedded, and indexed knowledge base in af-south-1 Qdrant
Retrieval, evaluation & tuning
Hybrid retrieval · Ragas · access control
Weeks 3–5
Hybrid retrieval (dense + BM25) is configured. The query engine is wired to your chosen LLM (Claude by default; Gemini available on conversion). Document-level access control is wired into the retrieval layer so users see only chunks they are authorised to see. Ragas runs against the test query set; we tune chunk size, overlap, embedding model, and retrieval mode to hit the agreed thresholds.
Deliverables
Configured query engine with source-level access control
Ragas evaluation report — faithfulness, answer relevancy, context precision, recall
Tuning log — what we tried, what worked, why we stopped where we did
Deploy, integrate & handover
Chat widget · API · runbooks · 30-day support
Weeks 5–6
The query API is deployed behind your authentication. The embeddable chat widget is styled and wired into the surface you choose — intranet, support portal, internal tool. Audit logging is verified, observability is wired in (Sentry, Langfuse), runbooks are written, and two knowledge-transfer sessions are run with your team. The 30-day post-delivery support window starts on go-live.
Deliverables
Production query API with rate limiting and access enforcement
Embeddable chat widget — styled, system prompt configured, embed code provided
POPIA Section 19 evidence pack — ingestion log, scrubber report, audit schema
30 days of priority support — retrieval tuning, prompt iteration, source updates
06 · Scope

Exactly what's in. Exactly what's not.

Fixed-scope means we have to be explicit about boundaries. The lists below are the standard inclusions and exclusions for the R55,000 starting price. Anything in the right column can be quoted as a separate engagement — or rolled into a sonofgraig platform subscription on conversion.

Included
In the fixed-scope engagement
  • Discovery workshop, document inventory, classification, POPIA risk register
  • Up to 4 source connections via Airbyte (Drive, Confluence, SharePoint, Notion, etc.)
  • Up to 5,000 documents ingested and indexed
  • PII scrubber configured and tested for SA personal-information patterns
  • Chunking strategy selection & embedding-model selection
  • Per-tenant Qdrant collection deployed in af-south-1
  • Hybrid retrieval (semantic + BM25) with source-cited answers
  • Document-level access control mapped to your user groups
  • Ragas evaluation report — 4 metrics, agreed thresholds
  • Production query API with rate limiting
  • Embeddable chat widget — styled, system prompt configured
  • Immutable query-level audit log with PostgreSQL deletion rules
  • POPIA Section 19 evidence pack and runbooks
  • Two knowledge-transfer sessions with your team
  • 30 days of priority support from go-live
Out of scope
Quoted separately
  • Custom AI agent built on top of the knowledge base — AI Agent Implementation
  • Multiple knowledge bases — +R20K each
  • Source systems beyond the standard four — +R8K each
  • Document volumes above 5,000 — quoted at scoping based on average size
  • Custom UI surface beyond the embeddable chat widget
  • Domain-specific embedding-model fine-tuning — Fine-Tuning Ops product
  • Source-system schema changes or upstream data engineering work
  • Long-running operational support beyond the 30-day window
  • LLM token costs — metered to your provider account
  • Information Officer outsourcing — you retain that role
07 · Retrieval quality

Numbers, not adjectives. Ragas, on a curve.

Retrieval quality is the deliverable. We agree thresholds during scoping, run the test query set after every change to the index, and ship the eval report on go-live. The four metrics below are the ones we always track. The bar fills shown are typical — your real numbers depend on your corpus and will be reported with confidence intervals.

Metric A · Faithfulness
Does the answer match the retrieved context?
A score of 1.0 means every claim in the answer is grounded in a retrieved chunk. The system is configured to refuse rather than fabricate.
Target ≥ 0.85~0.91
Metric B · Answer relevancy
Does the answer address the question?
Catches answers that are technically grounded but miss the user's actual intent — a common failure when retrieval surfaces nearby but tangential chunks.
Target ≥ 0.80~0.88
Metric C · Context precision
Are the retrieved chunks the right ones?
Penalises noise in the retrieved set. A high score means the LLM is reasoning over signal, not over a haystack of partially-relevant chunks.
Target ≥ 0.75~0.84
Metric D · Context recall
Did we retrieve everything we needed?
Catches the silent failure mode where the right chunk exists in the index but does not surface in the top-k. Often surfaces chunking-strategy problems.
Target ≥ 0.75~0.82

Bar values shown are illustrative typical results. Acceptance thresholds are agreed in writing during scoping — not after delivery.

08 · POPIA & data residency

Compliance designed in — not bolted on later.

POPIA-compliant RAG is not significantly more complex to build than a non-compliant version — provided compliance is treated as a design constraint from week one. The four cards below are the controls every engagement ships with, and the evidence pack you receive at handover.

POPIA s.19 · s.26
PII scrubbing before embedding
Synchronous middleware. SA ID numbers, mobile, email, and passport patterns detected and redacted before any chunk reaches the embedding model. Tested in CI with synthetic PII.
POPIA s.72
af-south-1 data residency
Vector store, document storage, and ingestion all run in AWS Cape Town. Egress allow-listed for approved endpoints; VPC flow logs confirm no data leaves the region during ingestion.
POPIA s.19(1)
Immutable audit log
Every query logged synchronously with SHA-256 hashes of the user ID and query text. PostgreSQL deletion rules prevent log row deletion. Stores document IDs, similarity scores, and processing basis — not raw content.
POPIA s.11 · s.13
Document-level access control
User groups mapped to document groups during scoping. Retrieval enforces access at the chunk level — users see only chunks from documents they are authorised to see. Source assertions linked to lawful basis.

Independent reference: sonofgraig publishes a complete POPIA compliance statement and a long-form engineering essay on POPIA-compliant RAG. The architecture in this engagement matches both documents — not a simplified version of either.

09 · Technology stack

An opinionated stack. Open source where it matters.

We do not invent the engine; we invest where the value is. Every component below is production-grade open source or a SaaS service we consciously chose not to rebuild. You inherit the same engineering decisions our enterprise platform was built on — and you keep the source code.

Component
Category
Role in your knowledge base
LlamaIndex
RAG framework
Document ingestion, chunking, embedding, indexing, and retrieval over your private knowledge base. MIT licensed.
Qdrant
Vector database
Self-hosted on Kubernetes in af-south-1. One isolated, encrypted collection per knowledge base for POPIA data residency. Apache 2.0.
Unstructured.io
Document parsing
Reliably extracts text, tables, and structured content from PDF, Word, PowerPoint, scanned documents, and HTML before LlamaIndex sees them.
Airbyte
Source connectors
350+ source connectors. You connect Drive, Confluence, SharePoint, Notion, S3, GitHub, and so on once; Airbyte syncs them on schedule.
Ragas
RAG evaluation
Faithfulness, answer relevancy, context precision, context recall — run on every change to the knowledge base index.
LiteLLM
AI gateway
Single interface to LLM providers. Handles fallbacks and token-usage tracking. Lets you swap models without code changes.
Anthropic Claude
LLM
Default reasoning model for answer generation. Receives only PII-scrubbed payloads. Gemini available as an alternative on Growth tier and above.
PostgreSQL (Supabase)
Audit log
Stores the immutable query-level audit log with PostgreSQL rules preventing deletion. Row-level security per organisation.
FastAPI
Query API
High-performance Python query endpoint. Hybrid retrieval, source citation, and access-control enforcement all enforced at the API layer.
AWS EKS · Terraform
Infrastructure
All workloads containerised on EKS in af-south-1. Every infrastructure change reviewable as Terraform code in your repository.
Cloudflare
Edge security
WAF, DDoS mitigation, rate limiting on your query API. Inherited from the sonofgraig platform security posture.
Sentry · Langfuse
Observability
Sentry for error capture and performance. Langfuse for prompt traces, token costs per query, and conversation logs. Both self-hostable.
10 · Pricing

One number. No hourly surprises.

sonofgraig service projects are deliberately simple to procure. The price is the price. Scope is fixed before contracting. Variations are quoted in writing and signed before any additional work is performed.

RAG Knowledge Base Setup
Fixed-scope engagement
R55,000 ZAR
Starting price. Final figure depends on the source-system count, document volume, and parsing complexity surfaced during scoping.
Single payment. 50% on contract signature, 50% on go-live.
4–6 weeks. Standard delivery window. Unusually large corpora may add up to a week.
30 days of post-delivery support. Retrieval tuning, prompt iteration, source updates.
POPIA documentation included. No separate compliance bill at the end.
Book a scoping call
What sits outside the engagement price
LLM token consumption Your provider bill
Cloud infrastructure costs (compute, storage) Pass-through
Embedding-model token cost (if external) Pass-through
Additional source connectors above 4 +R8K each
Additional knowledge bases +R20K each
Document volume above 5,000 docs Quoted at scoping
Continued support after 30-day window From R8K/mo
Custom REST connector with non-trivial auth Per case
Convert to platform on close. The knowledge base moves directly onto a sonofgraig platform subscription with no rebuild — this engagement is built on the same RAG Studio architecture as the product. Your first three months on Starter are credited against the implementation fee.
12 · Frequently asked

Questions procurement, legal and engineering ask.

If your team is preparing for a vendor review or a board sign-off, the answers below cover most of what gets raised. Anything else, your account team can route to engineering directly.

What is the difference between this and AI Agent Implementation?
RAG Knowledge Base Setup ships the knowledge layer — ingestion, embedding, retrieval, an embeddable chat widget, and the API your applications can call. AI Agent Implementation includes everything in this engagement and a custom agent built on top with tool integrations (Slack, email, CRM, etc.), human-in-the-loop approval gates, and full execution-trace logging. Pick this engagement if your first use case is "chat over my documents". Pick AI Agent Implementation if you need the system to take autonomous actions in other tools.
Is the R55,000 fixed, or just a starting figure?
It is the starting price for the standard scope — up to 4 source connections, up to 5,000 documents, hybrid retrieval, evaluation, embeddable chat widget, 30 days of support. Your final fixed price is confirmed at the end of the scoping phase, before any contract is signed. Once signed, the price does not move unless you formally request additional scope, which is quoted in writing and re-signed before work continues.
What happens to LLM and embedding token costs?
Token costs sit on your provider account, not ours, so you have full visibility. Embedding cost is one-off at ingestion plus marginal cost for new documents during sync. Query-time cost depends on volume. As an order-of-magnitude reference: ingestion of a 5,000-document corpus is typically R500–R1,500 in embedding cost; a knowledge base handling a few thousand queries a month against Anthropic Claude typically lands between R1,500 and R5,000 in monthly LLM cost. Both can be capped via budget alerts at scoping.
Where does our data physically live during the engagement?
By default, all production data sits in AWS af-south-1 (Cape Town, South Africa). The Qdrant vector store is self-hosted in af-south-1, document storage is in S3 in af-south-1, and ingestion runs entirely in-region. PII-scrubbed payloads are the only data ever sent to an external LLM provider, and that boundary is enforced by middleware. We can also deliver into Azure South Africa North or GCP Johannesburg if your group security policy dictates a specific provider.
Can users only see documents they are authorised to see?
Yes. During scoping we map your user groups to document groups. The retrieval layer enforces this at the chunk level — the LLM never receives a chunk the user is not authorised to see. This is particularly important for legal, HR, and finance use cases where document-level access matters. Access is also part of the audit log: every query records which documents were considered for that specific user.
What chunking strategy do you use?
It depends on your corpus. Recursive character-level chunking is the default for long-form documents (reports, manuals). Sentence-level chunking is preferred for legal text where paragraph boundaries matter. Semantic chunking is used for marketing and editorial content where natural topic shifts dominate. The choice is made during the configure phase based on test query performance — we measure, we do not guess.
Which embedding model do you use?
Two options. text-embedding-3-small (OpenAI) for the cost-efficient default with strong general performance. A locally-hosted open embedding model (e.g. BGE) when data residency or cost demands it — this runs entirely inside af-south-1 with zero external API calls. The choice is made during scoping with the residency profile of your data in mind.
What does the 30-day post-delivery support cover?
Priority response on issues, retrieval tuning when answers drift, prompt iteration based on real-world usage, addition of new documents to the index, and Ragas re-runs on request. It does not cover net-new features, additional source systems, or operational on-call — those are quoted as continued support from R8K/month or are included on a platform subscription.
Who owns the source code at handover?
You do. The query API source, infrastructure-as-code (Terraform), CI/CD configuration, the test query set, the chunking and embedding configuration, the runbooks, and the POPIA Section 19 evidence pack are committed to your repositories during the engagement — not at the end. sonofgraig retains no proprietary lock-ins on your knowledge base. If you choose not to convert to a platform subscription, the system runs on standard open-source components your team can maintain.
Do you sign Data Processing Agreements?
Yes. sonofgraig has a pre-signed Data Processing Agreement covering processing activities, lawful basis, security controls, sub-processors, and transfer mechanisms. It is available for download from our trust centre and your legal team can mark up departures from the standard text during contracting.
Are sonofgraig B-BBEE certified and CIPC registered?
Yes — sonofgraig is B-BBEE certified and CIPC registered. B-BBEE spend certificates are issued per invoice. All commercial documentation is available to your procurement team for supplier on-boarding.
Ready to scope

Book a 30-minute scoping call.

A senior solutions engineer joins, we step through your document sources and target use case, identify whether it fits the standard scope, and confirm what your final fixed price will be. No commitment until contract signature.