Build a Compliant Enterprise AI Assistant on Colab with RAG and Policy Guardrails
Setup and models
This tutorial shows how to assemble a compact, production-minded Enterprise AI assistant that you can run in Colab. The architecture relies on retrieval-augmented generation (RAG) using FAISS for fast document retrieval, Sentence Transformers for embeddings, and FLAN-T5 for generation — all open-source and free to use.
Start by installing dependencies and loading the models. The example ensures models use GPU when available for faster embedding and generation.
!pip -q install faiss-cpu transformers==4.44.2 accelerate sentence-transformers==3.0.1
from typing import List, Dict, Tuple
import re, textwrap, numpy as np, torch
from sentence_transformers import SentenceTransformer
import faiss
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
GEN_MODEL = "google/flan-t5-base"
EMB_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
gen_tok = AutoTokenizer.from_pretrained(GEN_MODEL)
gen_model = AutoModelForSeq2SeqLM.from_pretrained(GEN_MODEL, device_map="auto")
generate = pipeline("text2text-generation", model=gen_model, tokenizer=gen_tok)
emb_device = "cuda" if torch.cuda.is_available() else "cpu"
emb_model = SentenceTransformer(EMB_MODEL, device=emb_device)
Preparing enterprise documents and chunking
To simulate an internal knowledge base, create a small set of enterprise-style documents containing policies, runbooks, and SOPs. Long documents are split into overlapping chunks so embeddings capture local context while retrieval remains precise.
DOCS = [
{"id":"policy_sec_001","title":"Data Security Policy",
"text":"All customer data must be encrypted at rest (AES-256) and in transit (TLS 1.2+). Access is role-based (RBAC). Secrets are stored in a managed vault. Backups run nightly with 35-day retention. PII includes name, email, phone, address, PAN/Aadhaar."},
{"id":"policy_ai_002","title":"Responsible AI Guidelines",
"text":"Use internal models for confidential data. Retrieval sources must be logged. No customer decisioning without human-in-the-loop. Redact PII in prompts and outputs. All model prompts and outputs are stored for audit for 180 days."},
{"id":"runbook_inc_003","title":"Incident Response Runbook",
"text":"If a suspected breach occurs, page on-call SecOps. Rotate keys, isolate affected services, perform forensic capture, notify DPO within regulatory SLA. Communicate via the incident room only."},
{"id":"sop_sales_004","title":"Sales SOP - Enterprise Deals",
"text":"For RFPs, use the approved security questionnaire responses. Claims must match policy_sec_001. Custom clauses need Legal sign-off. Keep records in CRM with deal room links."}
]
def chunk(text:str, chunk_size=600, overlap=80):
w = text.split()
if len(w) <= chunk_size: return [text]
out=[]; i=0
while i < len(w):
j=min(i+chunk_size, len(w)); out.append(" ".join(w[i:j]))
if j==len(w): break
i = j - overlap
return out
CORPUS=[]
for d in DOCS:
for i,c in enumerate(chunk(d["text"])):
CORPUS.append({"doc_id":d["id"],"title":d["title"],"chunk_id":i,"text":c})
Indexing and embeddings
Encode chunks with the Sentence Transformer embedding model and build a FAISS index for fast similarity search. Normalizing embeddings and using inner product (cosine-like) search is common for retrieval tasks.
def build_index(chunks:List[Dict]) -> Tuple[faiss.IndexFlatIP, np.ndarray]:
vecs = emb_model.encode([c["text"] for c in chunks], normalize_embeddings=True, convert_to_numpy=True)
index = faiss.IndexFlatIP(vecs.shape[1]); index.add(vecs); return index, vecs
INDEX, VECS = build_index(CORPUS)
PII redaction and policy checks
To enforce enterprise safety, implement lightweight PII redaction rules and a simple policy-check layer that rejects queries which request disallowed operations such as large-scale data exfiltration or tampering with encryption.
PII_PATTERNS = [
(re.compile(r"\b\d{10}\b"), "<REDACTED_PHONE>"),
(re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I), "<REDACTED_EMAIL>"),
(re.compile(r"\b\d{12}\b"), "<REDACTED_ID12>"),
(re.compile(r"\b[A-Z]{5}\d{4}[A-Z]\b"), "<REDACTED_PAN>")
]
def redact(t:str)->str:
for p,r in PII_PATTERNS: t = p.sub(r, t)
return t
POLICY_DISALLOWED = [
re.compile(r"\b(share|exfiltrate)\b.*\b(raw|all)\b.*\bdata\b", re.I),
re.compile(r"\bdisable\b.*\bencryption\b", re.I),
]
def policy_check(q:str):
for r in POLICY_DISALLOWED:
if r.search(q): return False, "Request violates security policy (data exfiltration/encryption tampering)."
return True, ""
These checks are basic but demonstrate how guardrails can be embedded directly into the query handling pipeline to block risky requests and sanitize sensitive inputs.
Retrieval, prompting and generation
Define a retrieve function to fetch top-k relevant chunks from the FAISS index. Compose a structured prompt that instructs FLAN-T5 to answer strictly from the provided context, cite sources inline, and preserve redactions. The prompt and system instructions are shaped to keep answers concise and auditable.
def retrieve(query:str, k=4)->List[Dict]:
qv = emb_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)
scores, idxs = INDEX.search(qv, k)
return [{**CORPUS[i], "score": float(s)} for s,i in zip(scores[0], idxs[0])]
SYSTEM = ("You are an enterprise AI assistant.\n"
"- Answer strictly from the provided CONTEXT.\n"
"- If missing info, say what is unknown and suggest the correct policy/runbook.\n"
"- Keep it concise and cite titles + doc_ids inline like [Title (doc_id:chunk)].")
def build_prompt(user_q:str, ctx_blocks:List[Dict])->str:
ctx = "\n\n".join(f"[{i+1}] {b['title']} (doc:{b['doc_id']}:{b['chunk_id']})\n{b['text']}" for i,b in enumerate(ctx_blocks))
uq = redact(user_q)
return f"SYSTEM:\n{SYSTEM}\n\nCONTEXT:\n{ctx}\n\nUSER QUESTION:\n{uq}\n\nINSTRUCTIONS:\n- Cite sources inline.\n- Keep to 5-8 sentences.\n- Preserve redactions."
def answer(user_q:str, k=4, max_new_tokens=220)->Dict:
ok,msg = policy_check(user_q)
if not ok: return {"answer": f" {msg}", "ctx":[]}
ctx = retrieve(user_q, k=k); prompt = build_prompt(user_q, ctx)
out = generate(prompt, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"].strip()
return {"answer": out, "ctx": ctx}
This composition ensures generated responses are traceable to source documents and that sensitive details remain protected via redaction and policy checks.
Evaluation and examples
Add a simple evaluation routine that checks how many query terms appear in the retrieved context as a rough relevance metric. Run a few enterprise-oriented queries to validate retrieval, citation, and policy enforcement.
def eval_query(user_q:str, ctx:List[Dict])->Dict:
terms = [w.lower() for w in re.findall(r"[a-zA-Z]{4,}", user_q)]
ctx_text = " ".join(c["text"].lower() for c in ctx)
hits = sum(t in ctx_text for t in terms)
return {"terms": len(terms), "hits": hits, "hit_rate": round(hits/max(1,len(terms)), 2)}
QUERIES = [
"What encryption and backup rules do we follow for customer data?",
"Can we auto-answer RFP security questionnaires? What should we cite?",
"If there is a suspected breach, what are the first three steps?",
"Is it allowed to share all raw customer data externally for testing?"
]
for q in QUERIES:
res = answer(q, k=3)
print("\n" + "="*100); print("Q:", q); print("\nA:", res["answer"])
if res["ctx"]:
ev = eval_query(q, res["ctx"]); print("\nRetrieved Context (top 3):")
for r in res["ctx"]: print(f"- {r['title']} [{r['doc_id']}:{r['chunk_id']}] score={r['score']:.3f}")
print("Eval:", ev)
Running these examples demonstrates how retrieval-augmented responses are grounded in specific policy documents and how the pipeline enforces simple but effective guardrails.
This Colab-based blueprint illustrates how to combine FAISS, Sentence Transformers, and FLAN-T5 to create an auditable, policy-aware enterprise assistant that can be extended and hardened for production use.