DPDP-Compliant Document Extraction: Why Your AI Agent Needs India-First Data Processing
How to build AI agents that extract financial documents while staying compliant with India's Digital Personal Data Protection Act. Zero-persistence architecture, India data residency, and on-prem deployment options.
What Is DPDP and Why Should AI Agent Developers Care?
India's Digital Personal Data Protection Act (DPDP), 2023 changes how every AI application handles Indian users' financial data. If your agent reads bank statements, salary slips, or tax documents — you're processing personal data under DPDP.
The penalties are steep: up to Rs 250 crore for non-compliance. But the bigger risk for developers is losing enterprise customers who won't integrate an API that stores their users' documents on foreign servers.
The question every fintech CTO asks before integrating a document extraction API:> "Where does the document go, how long is it stored, and can we audit the data flow?"
If your answer involves "uploaded to a US server" or "stored temporarily for processing," you've lost the deal.
The Zero-Persistence Architecture
Most document extraction APIs follow this flow:
Upload → Store in S3 → Process → Return result → Delete later (maybe)
The problem: "delete later" is not DPDP-compliant. The moment you persist a document — even temporarily — you become a data processor with obligations around consent, purpose limitation, and data principal rights.
Lekha takes a different approach:
Upload → Process in memory → Return JSON → Document gone (never stored)
No S3 bucket. No temporary files. No database blob. The document exists only in RAM during the extraction call and is garbage collected immediately after.
What This Means in Code
When you call the Lekha API:
const result = await lekha.extract({
document: base64EncodedPdf,
type: "bank_statement",
});
Here's what happens on the server:
The response contains only structured data (account numbers, transactions, balances) — never the original document or images of it.
India Data Residency: Three Deployment Options
DPDP requires that certain categories of personal data be processed within India. Lekha supports three deployment models:
1. Cloud API (Default)
curl -X POST https://lekhadev.com/api/v1/extract \
-H "x-api-key: lk_live_your_key" \
-d '{"document": "...", "type": "auto"}'
2. India Cloud (Self-Hosted)
# docker-compose.india.yml
services:
lekha:
image: lekha:latest
environment:
- INDIA_OLLAMA_BASE_URL=http://ollama:11434
ollama:
image: ollama/ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
3. On-Premise
# Your own hardware, your own network
docker compose -f docker-compose.yml up -d
How the Provider Routing Works
Lekha automatically routes based on your plan and region:
Free plan: India Ollama → Local Ollama → OpenRouter → Claude (fallback)
Paid plan: Claude Sonnet 4 (default)
On-prem: Your Ollama instance (always)
Every API response includes metadata telling you exactly where your data was processed:
{
"metadata": {
"data_residency": "india",
"data_routing": "self-hosted",
"model": "qwen-2.5-vl-72b"
}
}
Your compliance team can audit every extraction and confirm no data left the country.
Building a DPDP-Compliant Lending Agent
Here's a practical example — a lending agent that reads bank statements and salary slips while staying fully compliant:
import { Lekha } from "@lekha-dev/sdk";
import Anthropic from "@anthropic-ai/sdk";
const lekha = new Lekha({
apiKey: process.env.LEKHA_API_KEY,
baseUrl: "https://your-india-instance.com", // India-hosted
});
async function assessLoan(bankStatementPdf: Buffer, salarySlipPdf: Buffer) {
// Extract documents — processed in India, never stored
const [bankData, salaryData] = await Promise.all([
lekha.extract({ document: bankStatementPdf, type: "bank_statement" }),
lekha.extract({ document: salarySlipPdf, type: "salary_slip" }),
]);
// Only structured JSON reaches the LLM — not the original documents
const anthropic = new Anthropic();
const analysis = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2000,
messages: [
{
role: "user",
content: Assess loan eligibility based on:
Bank Statement: ${JSON.stringify(bankData.data)}
Salary Slip: ${JSON.stringify(salaryData.data)}
Calculate DTI ratio, savings rate, and recommend max EMI.,
},
],
});
return analysis;
}
Key compliance points:
DPDP Compliance Checklist for Document Extraction
| Requirement | How Lekha Handles It | | ------------------- | -------------------------------------------------------------- | | Purpose limitation | API processes only for extraction, no secondary use | | Data minimization | Returns only structured fields, not raw document | | Storage limitation | Zero persistence — in-memory processing only | | Data residency | India deployment option (Docker + Ollama) | | Right to erasure | Nothing to erase — document never stored | | Consent management | Your app handles consent; Lekha is a processor | | Audit trail | Every response includes data_residency + data_routing metadata | | Breach notification | No document data to breach — only structured output |
What About the Extracted Data?
The JSON output (account numbers, transaction details, balances) is still personal data under DPDP. Lekha's job ends at extraction — how you store and process the structured output is your responsibility.
Best practices:
Getting Started
npm install @lekha-dev/sdk
import { Lekha } from "@lekha-dev/sdk";
const lekha = new Lekha({ apiKey: "lk_live_your_key" });
const result = await lekha.extract({
document: pdfBuffer,
type: "auto",
});
console.log(result.metadata.data_residency); // "india"
Sign up for a free API key at lekhadev.com — 100 extractions/month, all 10 document types, all 28 bank formats. For India-hosted deployment, check our deployment guide.