01 · Outcomes

What ships at the end of week six.

A focused RAG engagement only succeeds if the retrieval is good. Most failed enterprise RAG projects skip evaluation, ingest the wrong documents, or never run the PII scrubber. We tackle all three by making them deliverables — not afterthoughts.

Outcome 01

A searchable, citation-backed knowledge base

Your documents indexed in a per-tenant Qdrant collection inside af-south-1. Hybrid retrieval combines semantic and keyword search. Every answer cites the exact source document and page it drew from — no hallucinated references.

~3 sec typical end-to-end query latency

Outcome 02

A POPIA-compliant ingestion pipeline

PII detection and redaction runs synchronously before any document chunk is embedded. The vector store, document storage, and embedding model all sit in af-south-1. Every query is logged immutably with hashes — the audit log is a compliance record, not a copy of your data.

0 raw PII transmitted to embedding APIs

Outcome 03

Measured retrieval quality, not assumed

Ragas runs over a curated test corpus before go-live and on every change to the index. You receive numbers for faithfulness, answer relevancy, context precision, and context recall — with thresholds agreed during scoping. No black box.

4 Ragas metrics tracked over time

02 · The pipeline

Six stages from raw doc to cited answer.

The diagram below traces a single document from your source system to a query response. Each stage adds either value (chunking, embedding, retrieval) or a guarantee (PII scrubbing, audit logging). Nothing in the pipeline is opaque — every stage logs metrics that show up in your eval dashboard.

Stage 01

Source connect

Airbyte syncs from Drive, Confluence, SharePoint, S3, Notion, and more on schedule.

Airbyte

Stage 02

Parse & classify

Unstructured.io extracts text, tables, and structure from PDF, Word, PowerPoint, scans.

Unstructured.io

Stage 03

PII scrub

SA personal-info patterns (ID, mobile, email, passport) detected and redacted synchronously.

sonofgraig

Stage 04

Chunk & embed

Recursive, sentence, or semantic chunking. Embedding via OpenAI or local model in-region.

LlamaIndex

Stage 05

Index

Per-tenant Qdrant collection: {org_id}_{kb_id}. AES-256 at rest.

Qdrant

Stage 06

Retrieve & cite

Hybrid search returns ranked chunks. The LLM answers with paragraph-level citations.

FastAPI

03 · Source connectors

Connect once. Sync forever.

Source ingestion runs on Airbyte — the same open-source data integration platform our enterprise platform uses. Set up the connection once, choose a sync schedule, and the index stays fresh. The standard engagement covers up to four source connections; additional sources are quoted at R8K each.

Google Drive

Docs · Sheets · files

Confluence

Spaces · pages

SharePoint

M365 libraries

OneDrive

M365 personal

Notion

DBs · pages

GitHub

README · wiki · md

Amazon S3

Buckets · MinIO

Dropbox Business

Folders · files

Slack

Public channels

PDF / Word

Direct upload

REST API

Custom connector

Postgres / MySQL

Tables · views

Standard engagement: up to 4 source connections. Additional sources +R8,000 each. Custom REST connectors with non-trivial pagination or auth flows are scoped per case.

04 · How retrieval works

Source-cited answers, not hallucinated narratives.

A working RAG system is judged by its citations as much as its answers. The mock query below shows what a typical response looks like in your knowledge base — an answer in plain language, three retrieved chunks ranked by similarity, and links back to the source documents. The same shape ships in the embeddable chat widget.

Query · sample

What is our refund policy for enterprise customers who cancel during the first 30 days?

Retrieval modeHybrid (semantic + BM25)

top_k5 chunks

Embedding modeltext-embedding-3-small

LLMClaude Sonnet

Round-trip latency2.8 s

Faithfulness score0.94

Generated answer · cited

Enterprise customers who cancel within the first 30 calendar days of contract execution receive a full refund of the implementation fee, less any infrastructure pass-through costs already incurred^[1]. Refunds are processed within 14 business days of the cancellation request being received in writing^[2]. Cancellation after day 30 follows the standard subscription terms, which apply pro-rata against the remaining contract term^[3].

Retrieved chunks · ranked by similarity

1

Enterprise_Master_Agreement_v3.pdf · §8.2 · p.14

"Within the first thirty (30) calendar days of contract execution, the Customer may terminate this Agreement and receive a full refund of the implementation fee paid, less infrastructure pass-through costs already incurred by sonofgraig."

0.91

2

Refund_Operations_Runbook.md · § Processing timeline

"All approved refund requests are settled by EFT within 14 business days of the cancellation acknowledgement letter being sent to the Customer's registered representative."

0.86

3

Subscription_Terms.docx · §4 · Cancellation after the trial window

"After day 30, cancellation follows the standard subscription terms. Charges already invoiced for the remaining contract term are recoverable on a pro-rata basis only at sonofgraig's discretion."

0.78

05 · Delivery cadence

Four phases. Four to six weeks.

Every phase ships a tangible deliverable. Nothing is left for "later". The cadence below is the standard plan; if your document set is unusually large or your source systems require new connectors, the build phase can extend by up to a week.

Discovery, classification & data audit

Document inventory · Sensitivity classification · POPIA risk

Week 1

A workshop with your operating team and your Information Officer to map the documents in scope, classify them by sensitivity, identify the POPIA Section 11 lawful basis for each source, and confirm whose access is allowed to which sub-collection. The scoping document is signed before any code is written.

Deliverables

Document inventory with classification and processing-purpose statement per source

Access control matrix — user groups mapped to document groups

Test query set — 30–50 representative questions with gold answers

Ingestion, PII scrubbing & chunking

Airbyte connectors · Unstructured · LlamaIndex chunking

Weeks 2–3

Source connectors are configured and tested. Documents flow through Unstructured.io for parsing, then through the SA-specific PII scrubber, then into the chunking strategy chosen for your content type (recursive for long-form, sentence for legal text, semantic for marketing collateral). Embedding is performed with the model best suited to your accuracy & cost trade-off.

Deliverables

Configured Airbyte sources with sync schedules and health monitoring

PII scrubbing test report — ID nos, mobile, email, passport patterns

Chunked, embedded, and indexed knowledge base in af-south-1 Qdrant

Retrieval, evaluation & tuning

Hybrid retrieval · Ragas · access control

Weeks 3–5

Hybrid retrieval (dense + BM25) is configured. The query engine is wired to your chosen LLM (Claude by default; Gemini available on conversion). Document-level access control is wired into the retrieval layer so users see only chunks they are authorised to see. Ragas runs against the test query set; we tune chunk size, overlap, embedding model, and retrieval mode to hit the agreed thresholds.

Deliverables

Configured query engine with source-level access control

Ragas evaluation report — faithfulness, answer relevancy, context precision, recall

Tuning log — what we tried, what worked, why we stopped where we did

Deploy, integrate & handover

Chat widget · API · runbooks · 30-day support

Weeks 5–6

The query API is deployed behind your authentication. The embeddable chat widget is styled and wired into the surface you choose — intranet, support portal, internal tool. Audit logging is verified, observability is wired in (Sentry, Langfuse), runbooks are written, and two knowledge-transfer sessions are run with your team. The 30-day post-delivery support window starts on go-live.

Deliverables

Production query API with rate limiting and access enforcement

Embeddable chat widget — styled, system prompt configured, embed code provided

POPIA Section 19 evidence pack — ingestion log, scrubber report, audit schema

30 days of priority support — retrieval tuning, prompt iteration, source updates

06 · Scope

Exactly what's in. Exactly what's not.

Fixed-scope means we have to be explicit about boundaries. The lists below are the standard inclusions and exclusions for the R55,000 starting price. Anything in the right column can be quoted as a separate engagement — or rolled into a sonofgraig platform subscription on conversion.

Included

In the fixed-scope engagement

Discovery workshop, document inventory, classification, POPIA risk register
Up to 4 source connections via Airbyte (Drive, Confluence, SharePoint, Notion, etc.)
Up to 5,000 documents ingested and indexed
PII scrubber configured and tested for SA personal-information patterns
Chunking strategy selection & embedding-model selection
Per-tenant Qdrant collection deployed in af-south-1
Hybrid retrieval (semantic + BM25) with source-cited answers
Document-level access control mapped to your user groups
Ragas evaluation report — 4 metrics, agreed thresholds
Production query API with rate limiting
Embeddable chat widget — styled, system prompt configured
Immutable query-level audit log with PostgreSQL deletion rules
POPIA Section 19 evidence pack and runbooks
Two knowledge-transfer sessions with your team
30 days of priority support from go-live

Out of scope

Quoted separately

Custom AI agent built on top of the knowledge base — AI Agent Implementation
Multiple knowledge bases — +R20K each
Source systems beyond the standard four — +R8K each
Document volumes above 5,000 — quoted at scoping based on average size
Custom UI surface beyond the embeddable chat widget
Domain-specific embedding-model fine-tuning — Fine-Tuning Ops product
Source-system schema changes or upstream data engineering work
Long-running operational support beyond the 30-day window
LLM token costs — metered to your provider account
Information Officer outsourcing — you retain that role

07 · Retrieval quality

Numbers, not adjectives. Ragas, on a curve.

Retrieval quality is the deliverable. We agree thresholds during scoping, run the test query set after every change to the index, and ship the eval report on go-live. The four metrics below are the ones we always track. The bar fills shown are typical — your real numbers depend on your corpus and will be reported with confidence intervals.

Metric A · Faithfulness

Does the answer match the retrieved context?

A score of 1.0 means every claim in the answer is grounded in a retrieved chunk. The system is configured to refuse rather than fabricate.

Target ≥ 0.85~0.91

Metric B · Answer relevancy

Does the answer address the question?

Catches answers that are technically grounded but miss the user's actual intent — a common failure when retrieval surfaces nearby but tangential chunks.

Target ≥ 0.80~0.88

Metric C · Context precision

Are the retrieved chunks the right ones?

Penalises noise in the retrieved set. A high score means the LLM is reasoning over signal, not over a haystack of partially-relevant chunks.

Target ≥ 0.75~0.84

Metric D · Context recall

Did we retrieve everything we needed?

Catches the silent failure mode where the right chunk exists in the index but does not surface in the top-k. Often surfaces chunking-strategy problems.

Target ≥ 0.75~0.82

Bar values shown are illustrative typical results. Acceptance thresholds are agreed in writing during scoping — not after delivery.

08 · POPIA & data residency

Compliance designed in — not bolted on later.

POPIA-compliant RAG is not significantly more complex to build than a non-compliant version — provided compliance is treated as a design constraint from week one. The four cards below are the controls every engagement ships with, and the evidence pack you receive at handover.

POPIA s.19 · s.26

PII scrubbing before embedding

Synchronous middleware. SA ID numbers, mobile, email, and passport patterns detected and redacted before any chunk reaches the embedding model. Tested in CI with synthetic PII.

POPIA s.72

af-south-1 data residency

Vector store, document storage, and ingestion all run in AWS Cape Town. Egress allow-listed for approved endpoints; VPC flow logs confirm no data leaves the region during ingestion.

POPIA s.19(1)

Immutable audit log

Every query logged synchronously with SHA-256 hashes of the user ID and query text. PostgreSQL deletion rules prevent log row deletion. Stores document IDs, similarity scores, and processing basis — not raw content.

POPIA s.11 · s.13

Document-level access control

User groups mapped to document groups during scoping. Retrieval enforces access at the chunk level — users see only chunks from documents they are authorised to see. Source assertions linked to lawful basis.

Independent reference: sonofgraig publishes a complete POPIA compliance statement and a long-form engineering essay on POPIA-compliant RAG. The architecture in this engagement matches both documents — not a simplified version of either.

09 · Technology stack

An opinionated stack. Open source where it matters.

We do not invent the engine; we invest where the value is. Every component below is production-grade open source or a SaaS service we consciously chose not to rebuild. You inherit the same engineering decisions our enterprise platform was built on — and you keep the source code.

Component

One number. No hourly surprises.

sonofgraig service projects are deliberately simple to procure. The price is the price. Scope is fixed before contracting. Variations are quoted in writing and signed before any additional work is performed.

RAG Knowledge Base Setup

Fixed-scope engagement

R55,000 ZAR

Starting price. Final figure depends on the source-system count, document volume, and parsing complexity surfaced during scoping.

Single payment. 50% on contract signature, 50% on go-live.

4–6 weeks. Standard delivery window. Unusually large corpora may add up to a week.

30 days of post-delivery support. Retrieval tuning, prompt iteration, source updates.

POPIA documentation included. No separate compliance bill at the end.

Book a scoping call

What sits outside the engagement price

LLM token consumption Your provider bill

Cloud infrastructure costs (compute, storage) Pass-through

Embedding-model token cost (if external) Pass-through

Additional source connectors above 4 +R8K each

Additional knowledge bases +R20K each

Document volume above 5,000 docs Quoted at scoping

Continued support after 30-day window From R8K/mo

Custom REST connector with non-trivial auth Per case

Convert to platform on close. The knowledge base moves directly onto a sonofgraig platform subscription with no rebuild — this engagement is built on the same RAG Studio architecture as the product. Your first three months on Starter are credited against the implementation fee.

11 · Related services

If you need more than retrieval.

RAG Knowledge Base Setup is the right starting point if your first AI use case is a chat over documents. If you need an autonomous agent, a compliance audit, or a different cluster, one of the engagements below is probably a better fit. Scope can also be combined — ask during scoping.

Service · AI Dev Platform

AI Agent Implementation

From R95K6–10 weeks

RAG Knowledge Base Setup plus a custom AI agent built on the canvas, integrated with up to six tools (Slack, email, CRM, ticketing, databases). Includes the full agent runtime, human-in-the-loop gates, and execution-trace logging.

Read the brief

Service · Governance

POPIA AI Governance Audit

From R45K2–3 weeks

A focused gap analysis of any AI system your organisation is already running — ChatGPT for the team, Copilot in the suite, a third-party RAG. Audit against POPIA ss.11, 19, and 72; data flow mapping; prioritised remediation roadmap.

Read the brief

Service · Cloud & DevOps

Cloud Architecture Setup

From R75K4–6 weeks

Zero-trust cloud foundation in AWS af-south-1 — or Azure South Africa North or GCP Johannesburg — with Terraform IaC, GitHub Actions CI/CD, observability stack, and POPIA Section 72 compliance documentation. The right precursor if you need infrastructure first.

Read the brief

Platform · Subscription

sonofgraig RAG Studio

From R4,999/moStarter tier

If you would prefer a self-serve platform over an engagement, RAG Studio gives your team the same architecture as a managed product — knowledge base list, ingestion pipeline, chunk & embed config, query tester, eval dashboard, embeddable chat. ZAR-priced, POPIA-native, AWS af-south-1.

See RAG Studio

12 · Frequently asked

Questions procurement, legal and engineering ask.

If your team is preparing for a vendor review or a board sign-off, the answers below cover most of what gets raised. Anything else, your account team can route to engineering directly.

What is the difference between this and AI Agent Implementation?

RAG Knowledge Base Setup ships the knowledge layer — ingestion, embedding, retrieval, an embeddable chat widget, and the API your applications can call. AI Agent Implementation includes everything in this engagement and a custom agent built on top with tool integrations (Slack, email, CRM, etc.), human-in-the-loop approval gates, and full execution-trace logging. Pick this engagement if your first use case is "chat over my documents". Pick AI Agent Implementation if you need the system to take autonomous actions in other tools.

Is the R55,000 fixed, or just a starting figure?

It is the starting price for the standard scope — up to 4 source connections, up to 5,000 documents, hybrid retrieval, evaluation, embeddable chat widget, 30 days of support. Your final fixed price is confirmed at the end of the scoping phase, before any contract is signed. Once signed, the price does not move unless you formally request additional scope, which is quoted in writing and re-signed before work continues.

What happens to LLM and embedding token costs?

Token costs sit on your provider account, not ours, so you have full visibility. Embedding cost is one-off at ingestion plus marginal cost for new documents during sync. Query-time cost depends on volume. As an order-of-magnitude reference: ingestion of a 5,000-document corpus is typically R500–R1,500 in embedding cost; a knowledge base handling a few thousand queries a month against Anthropic Claude typically lands between R1,500 and R5,000 in monthly LLM cost. Both can be capped via budget alerts at scoping.

Where does our data physically live during the engagement?

By default, all production data sits in AWS af-south-1 (Cape Town, South Africa). The Qdrant vector store is self-hosted in af-south-1, document storage is in S3 in af-south-1, and ingestion runs entirely in-region. PII-scrubbed payloads are the only data ever sent to an external LLM provider, and that boundary is enforced by middleware. We can also deliver into Azure South Africa North or GCP Johannesburg if your group security policy dictates a specific provider.

Can users only see documents they are authorised to see?

Yes. During scoping we map your user groups to document groups. The retrieval layer enforces this at the chunk level — the LLM never receives a chunk the user is not authorised to see. This is particularly important for legal, HR, and finance use cases where document-level access matters. Access is also part of the audit log: every query records which documents were considered for that specific user.

What chunking strategy do you use?

It depends on your corpus. Recursive character-level chunking is the default for long-form documents (reports, manuals). Sentence-level chunking is preferred for legal text where paragraph boundaries matter. Semantic chunking is used for marketing and editorial content where natural topic shifts dominate. The choice is made during the configure phase based on test query performance — we measure, we do not guess.

Which embedding model do you use?

Two options. text-embedding-3-small (OpenAI) for the cost-efficient default with strong general performance. A locally-hosted open embedding model (e.g. BGE) when data residency or cost demands it — this runs entirely inside af-south-1 with zero external API calls. The choice is made during scoping with the residency profile of your data in mind.

What does the 30-day post-delivery support cover?

Priority response on issues, retrieval tuning when answers drift, prompt iteration based on real-world usage, addition of new documents to the index, and Ragas re-runs on request. It does not cover net-new features, additional source systems, or operational on-call — those are quoted as continued support from R8K/month or are included on a platform subscription.

Who owns the source code at handover?

You do. The query API source, infrastructure-as-code (Terraform), CI/CD configuration, the test query set, the chunking and embedding configuration, the runbooks, and the POPIA Section 19 evidence pack are committed to your repositories during the engagement — not at the end. sonofgraig retains no proprietary lock-ins on your knowledge base. If you choose not to convert to a platform subscription, the system runs on standard open-source components your team can maintain.

Do you sign Data Processing Agreements?

Yes. sonofgraig has a pre-signed Data Processing Agreement covering processing activities, lawful basis, security controls, sub-processors, and transfer mechanisms. It is available for download from our trust centre and your legal team can mark up departures from the standard text during contracting.

Are sonofgraig B-BBEE certified and CIPC registered?

Yes — sonofgraig is B-BBEE certified and CIPC registered. B-BBEE spend certificates are issued per invoice. All commercial documentation is available to your procurement team for supplier on-boarding.

Ready to scope

Book a 30-minute scoping call.

A senior solutions engineer joins, we step through your document sources and target use case, identify whether it fits the standard scope, and confirm what your final fixed price will be. No commitment until contract signature.

Book a scoping call All service projects

Your documents. Searchable. Cited. POPIA-safe.

What ships at the end of week six.

Six stages from raw doc to cited answer.

Connect once. Sync forever.

Source-cited answers, not hallucinated narratives.

Four phases. Four to six weeks.

Exactly what's in. Exactly what's not.

Numbers, not adjectives. Ragas, on a curve.

Compliance designed in — not bolted on later.

An opinionated stack. Open source where it matters.

One number. No hourly surprises.

If you need more than retrieval.

Questions procurement, legal and engineering ask.

Book a 30-minute scoping call.