Parsing HDFC Bank Statements with AI: Developer Guide
Extract structured JSON from any HDFC bank statement — savings, current, or credit card. Handles passwords, multi-page PDFs, and format changes. Full TypeScript code.
HDFC Bank serves 90+ million customers — which means nearly every Indian fintech workflow touches an HDFC statement at some point. Loan underwriting, KYC verification, expense categorisation, tax filing: they all start with someone uploading an HDFC PDF. Parsing that PDF with regex or legacy OCR breaks on every quarterly format change. This guide shows you a better way.
HDFC Statement Formats You'll Encounter
HDFC issues statements in several layouts, each with different extraction challenges:
| Format | Source | Main challenge | | --------------------- | ------------------- | ------------------------------------------ | | Savings account PDF | NetBanking download | 5–50 pages, always password-protected | | Current account PDF | Branch / NetBanking | Multiple signatories, complex tables | | Credit card statement | NetBanking / email | Credit/debit convention reversed vs bank | | Mini statement | ATM print | Plain text, fixed-width, no balance column |
The NetBanking PDF is what your users will upload 90% of the time. HDFC's proprietary layout changes roughly every 12–18 months, which is why regex-based parsers accumulate technical debt so fast.
What a Parsed HDFC Statement Should Return
A complete extraction should give you structured data across four areas:
Here's the TypeScript interface for the target shape:
interface HDFCStatement {
account: {
number: string; // "XXXX1234"
holder_name: string; // "Rahul Sharma"
ifsc: string; // "HDFC0001234"
branch: string; // "Koramangala, Bengaluru"
account_type: string; // "Savings" | "Current" | "CreditCard"
};
period: {
from: string; // "2025-01-01"
to: string; // "2025-03-31"
};
summary: {
opening_balance: number;
closing_balance: number;
total_credits: number;
total_debits: number;
transaction_count: number;
};
transactions: Array<{
date: string; // "2025-01-15"
description: string;
debit: number | null;
credit: number | null;
balance: number;
reference?: string; // UPI ref, cheque number, etc.
}>;
}
Amounts are always numbers — never strings like "₹1,45,000". Dates are always ISO 8601 — never "15/01/2025".
Parsing an HDFC Statement with Lekha
Lekha handles all HDFC format variants out of the box. Pass the PDF buffer and get structured JSON back in one call:
import { Lekha } from "lekha-sdk";
import { readFileSync } from "fs";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
async function parseHDFCStatement(
pdfPath: string,
password?: string,
): Promise {
const file = readFileSync(pdfPath);
const result = await lekha.extract({
document: file,
type: "bank_statement",
hint: { bank: "hdfc" }, // Optional: skips auto-classification
password,
});
if (!result.success) {
throw new Error(Extraction failed: ${result.error.message});
}
return result.data as HDFCStatement;
}
// Usage
const statement = await parseHDFCStatement(
"./hdfc-q1-2025.pdf",
"RAHU01011990", // First 4 chars of name + DOB (DDMMYYYY)
);
console.log(${statement.transactions.length} transactions extracted);
console.log(
Closing balance: ₹${statement.summary.closing_balance.toLocaleString("en-IN")},
);
Try it now at lekhadev.com/playground — no signup required.
Handling Password-Protected HDFC PDFs
Every HDFC NetBanking PDF is password-protected. The default format is:
DDMMYYYY RAHU01011990
DDMMYYYY 01011990
Users often don't remember their exact PDF password. Build a candidate-list fallback:
async function extractWithFallback(
pdfBuffer: Buffer,
name: string,
dob: string, // "DDMMYYYY"
): Promise {
const namePart = name.toUpperCase().slice(0, 4);
const candidates = [
namePart + dob, // standard savings format
dob, // credit card format
namePart.toLowerCase() + dob,
name.toLowerCase().slice(0, 4) + dob,
];
for (const password of candidates) {
const result = await lekha.extract({
document: pdfBuffer,
type: "bank_statement",
password,
});
if (result.success) return result.data as HDFCStatement;
}
throw new Error(
"Could not unlock PDF — prompt the user to enter their statement password manually",
);
}
If all candidates fail, surface a clear error message and let the user type the password directly. Never silently swallow decryption failures.
Validating the Extracted Data
Extraction accuracy is high, but running a quick sanity check before you write to your database costs almost nothing and catches the rare edge case:
function validateHDFCStatement(data: HDFCStatement): string[] {
const errors: string[] = [];
// 1. Balance reconciliation
const computed = data.transactions.reduce(
(bal, tx) => bal + (tx.credit ?? 0) - (tx.debit ?? 0),
data.summary.opening_balance,
);
if (Math.abs(computed - data.summary.closing_balance) > 1) {
errors.push(
Balance mismatch — computed ₹${computed}, reported ₹${data.summary.closing_balance},
);
}
// 2. Chronological order
const dates = data.transactions.map((tx) => new Date(tx.date).getTime());
if (!dates.every((d, i) => i === 0 || d >= dates[i - 1])) {
errors.push("Transactions are not in chronological order");
}
// 3. IFSC sanity
if (!/^HDFC\d{7}$/.test(data.account.ifsc)) {
errors.push(Unexpected IFSC format: ${data.account.ifsc});
}
return errors;
}
const issues = validateHDFCStatement(statement);
if (issues.length > 0) {
console.warn("Statement validation warnings:", issues);
}
A balance mismatch greater than ₹1 usually means a transaction was missed — worth flagging to your support team or triggering a re-extraction.
Processing Long Multi-Page Statements
Active HDFC current accounts can generate 30–50 page statements with 500+ transactions. Lekha processes the entire document in a single call — no manual chunking. For very large statements where you want to show progress to users, use the streaming API:
const stream = await lekha.extractStream({
document: pdfBuffer,
type: "bank_statement",
});
const transactions: HDFCStatement["transactions"] = [];
for await (const chunk of stream) {
if (chunk.type === "transaction") {
transactions.push(chunk.data);
// Update a progress bar or emit a server-sent event
onProgress(transactions.length);
}
}
See the streaming API docs for the full event schema.
Common HDFC Parsing Mistakes
Treating credit card statements as bank statements. HDFC credit card PDFs reverse the credit/debit convention — a payment you make appears as a "debit" from the card's perspective. Always checkaccount.account_type and branch your logic accordingly.
Ignoring UPI reference IDs. HDFC embeds UPI references in transaction descriptions (UPI/CR/123456789/GPAY). These are essential for deduplication when you reconcile against a payments provider. Extract and store them.
Locale-aware number parsing. HDFC uses Indian number grouping: 1,45,000.00. If you post-process transaction amounts yourself, strip commas before parsing: parseFloat(str.replace(/,/g, '')).
Skipping the opening balance row. The first row in most HDFC statements is an "Opening Balance" entry with no debit or credit amount. Naive table parsers drop it, breaking your balance reconciliation.
Lekha handles all four cases correctly by default.
Putting It Together: HDFC Loan Underwriting Agent
Here's a production-ready agent that reads an HDFC statement and produces a loan eligibility assessment using Claude:
import { Lekha } from "lekha-sdk"; import Anthropic from "@anthropic-ai/sdk";, }, ], });const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY }); const anthropic = new Anthropic();
async function underwriteLoan(pdfBuffer: Buffer, requestedAmount: number) { // Step 1: Extract statement const { data: stmt } = (await lekha.extract({ document: pdfBuffer, type: "bank_statement", })) as { data: HDFCStatement };
// Step 2: Compute trailing-3-month metrics const cutoff = new Date(); cutoff.setMonth(cutoff.getMonth() - 3);
const recent = stmt.transactions.filter((tx) => new Date(tx.date) >= cutoff);
const credits = recent.reduce((s, tx) => s + (tx.credit ?? 0), 0); const debits = recent.reduce((s, tx) => s + (tx.debit ?? 0), 0); const avgMonthlyInflow = credits / 3; const netCashflow = (credits - debits) / 3;
// Step 3: Ask Claude for an eligibility assessment const response = await anthropic.messages.create({ model: "claude-sonnet-4-6", max_tokens: 512, messages: [ { role: "user", content:
Assess loan eligibility for a request of ₹${requestedAmount.toLocaleString("en-IN")}.Account type: ${stmt.account.account_type} Branch: ${stmt.account.branch} Statement period: ${stmt.period.from} → ${stmt.period.to} Avg monthly inflow (last 3 months): ₹${avgMonthlyInflow.toLocaleString("en-IN")} Net monthly cashflow: ₹${netCashflow.toLocaleString("en-IN")} Closing balance: ₹${stmt.summary.closing_balance.toLocaleString("en-IN")}
Provide: eligible (yes/no), recommended max EMI, and a one-sentence rationale.
const assessment = response.content[0].type === "text" ? response.content[0].text : "";
return { account: stmt.account, metrics: { avgMonthlyInflow, netCashflow }, closing_balance: stmt.summary.closing_balance, assessment, }; }
Read more examples in the Lekha docs.
FAQ
Can Lekha parse HDFC credit card statements? Yes. Lekha auto-detects savings, current, and credit card formats. To skip auto-detection and save ~200 ms, passhint: { bank: "hdfc", document_type: "credit_card_statement" } in your request.
What if the statement covers multiple account holders?
HDFC joint account statements list both holders in the header. Lekha returns both names in account.holder_name as a comma-separated string: "Rahul Sharma, Priya Sharma".
How many transactions can Lekha extract?
There is no limit. We've processed HDFC current account statements with 2,000+ transactions spanning multiple years. The streaming API is recommended for anything above ~500 transactions.
Does Lekha store the PDF after extraction?
No. Documents are processed entirely in memory and discarded immediately after the API response is sent — zero persistence, full DPDP compliance. See our privacy architecture for details.
Ready to stop wrestling with HDFC PDFs? Sign up for a free Lekha API key and extract your first statement in under five minutes. Questions? Reach us at hi@lekhadev.com or open a thread in our Discord.