2026-04-14·5 min read

DPDP-Compliant Document Extraction: Why Your AI Agent Needs India-First Data Processing

How to build AI agents that extract financial documents while staying compliant with India's Digital Personal Data Protection Act. Zero-persistence architecture, India data residency, and on-prem deployment options.

dpdp complianceindia data protectionfinancial document apiai agent compliancedata residency indiadocument extractionfintech regulationon premise deployment

What Is DPDP and Why Should AI Agent Developers Care?

India's Digital Personal Data Protection Act (DPDP), 2023 changes how every AI application handles Indian users' financial data. If your agent reads bank statements, salary slips, or tax documents — you're processing personal data under DPDP.

The penalties are steep: up to Rs 250 crore for non-compliance. But the bigger risk for developers is losing enterprise customers who won't integrate an API that stores their users' documents on foreign servers.

The question every fintech CTO asks before integrating a document extraction API:

> "Where does the document go, how long is it stored, and can we audit the data flow?"

If your answer involves "uploaded to a US server" or "stored temporarily for processing," you've lost the deal.

The Zero-Persistence Architecture

Most document extraction APIs follow this flow:

Upload → Store in S3 → Process → Return result → Delete later (maybe)

The problem: "delete later" is not DPDP-compliant. The moment you persist a document — even temporarily — you become a data processor with obligations around consent, purpose limitation, and data principal rights.

Lekha takes a different approach:

Upload → Process in memory → Return JSON → Document gone (never stored)

No S3 bucket. No temporary files. No database blob. The document exists only in RAM during the extraction call and is garbage collected immediately after.

What This Means in Code

When you call the Lekha API:

const result = await lekha.extract({
  document: base64EncodedPdf,
  type: "bank_statement",
});

Here's what happens on the server:

The base64 document arrives in the request body

It's decoded into a buffer in memory

Sent to the vision model for extraction

The structured JSON is returned

The buffer is dereferenced — no write to disk ever happens

The response contains only structured data (account numbers, transactions, balances) — never the original document or images of it.

India Data Residency: Three Deployment Options

DPDP requires that certain categories of personal data be processed within India. Lekha supports three deployment models:

1. Cloud API (Default)

curl -X POST https://lekhadev.com/api/v1/extract \
  -H "x-api-key: lk_live_your_key" \
  -d '{"document": "...", "type": "auto"}'

Processing: Anthropic's Claude (US-based)

Best for: Startups, prototyping, non-sensitive documents

Data residency: Document never persisted, but crosses border during API call

2. India Cloud (Self-Hosted)

# docker-compose.india.yml
services:
  lekha:
    image: lekha:latest
    environment:
      - INDIA_OLLAMA_BASE_URL=http://ollama:11434
  ollama:
    image: ollama/ollama
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Processing: Qwen 2.5-VL-72B on AWS Mumbai (ap-south-1)

Best for: Regulated enterprises, NBFC, banks

Data residency: Document never leaves India

3. On-Premise

# Your own hardware, your own network
docker compose -f docker-compose.yml up -d

Processing: Ollama on your GPU servers

Best for: Banks, insurance companies, government

Data residency: Document never leaves your network

How the Provider Routing Works

Lekha automatically routes based on your plan and region:

Free plan:  India Ollama → Local Ollama → OpenRouter → Claude (fallback)
Paid plan:  Claude Sonnet 4 (default)
On-prem:    Your Ollama instance (always)

Every API response includes metadata telling you exactly where your data was processed:

{
  "metadata": {
    "data_residency": "india",
    "data_routing": "self-hosted",
    "model": "qwen-2.5-vl-72b"
  }
}

Your compliance team can audit every extraction and confirm no data left the country.

Building a DPDP-Compliant Lending Agent

Here's a practical example — a lending agent that reads bank statements and salary slips while staying fully compliant:

import { Lekha } from "@lekha-dev/sdk";
import Anthropic from "@anthropic-ai/sdk";
const lekha = new Lekha({
  apiKey: process.env.LEKHA_API_KEY,
  baseUrl: "https://your-india-instance.com", // India-hosted
});
async function assessLoan(bankStatementPdf: Buffer, salarySlipPdf: Buffer) {
  // Extract documents — processed in India, never stored
  const [bankData, salaryData] = await Promise.all([
    lekha.extract({ document: bankStatementPdf, type: "bank_statement" }),
    lekha.extract({ document: salarySlipPdf, type: "salary_slip" }),
  ]);
// Only structured JSON reaches the LLM — not the original documents
  const anthropic = new Anthropic();
  const analysis = await anthropic.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2000,
    messages: [
      {
        role: "user",
        content: Assess loan eligibility based on:
        Bank Statement: ${JSON.stringify(bankData.data)}
        Salary Slip: ${JSON.stringify(salaryData.data)}
        Calculate DTI ratio, savings rate, and recommend max EMI.,
      },
    ],
  });
return analysis;
}

Key compliance points:

Lekha extracts in India — original PDFs never leave the country

Only structured JSON goes to Claude — no PII in the document images

Zero storage — nothing persisted at any layer

DPDP Compliance Checklist for Document Extraction

| Requirement | How Lekha Handles It | | ------------------- | -------------------------------------------------------------- | | Purpose limitation | API processes only for extraction, no secondary use | | Data minimization | Returns only structured fields, not raw document | | Storage limitation | Zero persistence — in-memory processing only | | Data residency | India deployment option (Docker + Ollama) | | Right to erasure | Nothing to erase — document never stored | | Consent management | Your app handles consent; Lekha is a processor | | Audit trail | Every response includes data_residency + data_routing metadata | | Breach notification | No document data to breach — only structured output |

What About the Extracted Data?

The JSON output (account numbers, transaction details, balances) is still personal data under DPDP. Lekha's job ends at extraction — how you store and process the structured output is your responsibility.

Best practices:

Encrypt extracted data at rest

Apply retention policies (don't keep bank statements JSON forever)

Implement access controls (who can see extracted financial data?)

Log access for audit purposes

Getting Started

npm install @lekha-dev/sdk

import { Lekha } from "@lekha-dev/sdk";
const lekha = new Lekha({ apiKey: "lk_live_your_key" });
const result = await lekha.extract({
  document: pdfBuffer,
  type: "auto",
});
console.log(result.metadata.data_residency); // "india"

Sign up for a free API key at lekhadev.com — 100 extractions/month, all 10 document types, all 28 bank formats. For India-hosted deployment, check our deployment guide.