← Back to blog
·8 min read

Parsing HDFC Bank Statements with AI: Developer Guide

Extract structured JSON from any HDFC bank statement — savings, current, or credit card. Handles passwords, multi-page PDFs, and format changes. Full TypeScript code.

hdfc bank statementbank statement parserindian bank apifinancial document apipdf extractionfintech indiaai bankingloan underwriting

HDFC Bank serves 90+ million customers — which means nearly every Indian fintech workflow touches an HDFC statement at some point. Loan underwriting, KYC verification, expense categorisation, tax filing: they all start with someone uploading an HDFC PDF. Parsing that PDF with regex or legacy OCR breaks on every quarterly format change. This guide shows you a better way.

HDFC Statement Formats You'll Encounter

HDFC issues statements in several layouts, each with different extraction challenges:

| Format | Source | Main challenge | | --------------------- | ------------------- | ------------------------------------------ | | Savings account PDF | NetBanking download | 5–50 pages, always password-protected | | Current account PDF | Branch / NetBanking | Multiple signatories, complex tables | | Credit card statement | NetBanking / email | Credit/debit convention reversed vs bank | | Mini statement | ATM print | Plain text, fixed-width, no balance column |

The NetBanking PDF is what your users will upload 90% of the time. HDFC's proprietary layout changes roughly every 12–18 months, which is why regex-based parsers accumulate technical debt so fast.

What a Parsed HDFC Statement Should Return

A complete extraction should give you structured data across four areas:

  • Account metadata — account number (masked), IFSC, branch, holder name, account type
  • Statement period — from/to dates in ISO 8601
  • Transactions — date, description, debit, credit, running balance, UPI reference
  • Summary — opening balance, closing balance, total credits, total debits
  • Here's the TypeScript interface for the target shape:

    interface HDFCStatement {
      account: {
        number: string; // "XXXX1234"
        holder_name: string; // "Rahul Sharma"
        ifsc: string; // "HDFC0001234"
        branch: string; // "Koramangala, Bengaluru"
        account_type: string; // "Savings" | "Current" | "CreditCard"
      };
      period: {
        from: string; // "2025-01-01"
        to: string; // "2025-03-31"
      };
      summary: {
        opening_balance: number;
        closing_balance: number;
        total_credits: number;
        total_debits: number;
        transaction_count: number;
      };
      transactions: Array<{
        date: string; // "2025-01-15"
        description: string;
        debit: number | null;
        credit: number | null;
        balance: number;
        reference?: string; // UPI ref, cheque number, etc.
      }>;
    }
    

    Amounts are always numbers — never strings like "₹1,45,000". Dates are always ISO 8601 — never "15/01/2025".

    Parsing an HDFC Statement with Lekha

    Lekha handles all HDFC format variants out of the box. Pass the PDF buffer and get structured JSON back in one call:

    import { Lekha } from "lekha-sdk";
    import { readFileSync } from "fs";
    

    const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });

    async function parseHDFCStatement( pdfPath: string, password?: string, ): Promise { const file = readFileSync(pdfPath);

    const result = await lekha.extract({ document: file, type: "bank_statement", hint: { bank: "hdfc" }, // Optional: skips auto-classification password, });

    if (!result.success) { throw new Error(Extraction failed: ${result.error.message}); }

    return result.data as HDFCStatement; }

    // Usage const statement = await parseHDFCStatement( "./hdfc-q1-2025.pdf", "RAHU01011990", // First 4 chars of name + DOB (DDMMYYYY) );

    console.log(${statement.transactions.length} transactions extracted); console.log( Closing balance: ₹${statement.summary.closing_balance.toLocaleString("en-IN")}, );

    Try it now at lekhadev.com/playground — no signup required.

    Handling Password-Protected HDFC PDFs

    Every HDFC NetBanking PDF is password-protected. The default format is:

  • Savings / Current: first 4 letters of name (uppercase) + date of birth as DDMMYYYY
  • RAHU01011990
  • Credit card: date of birth as DDMMYYYY
  • 01011990

    Users often don't remember their exact PDF password. Build a candidate-list fallback:

    async function extractWithFallback(
      pdfBuffer: Buffer,
      name: string,
      dob: string, // "DDMMYYYY"
    ): Promise {
      const namePart = name.toUpperCase().slice(0, 4);
      const candidates = [
        namePart + dob, // standard savings format
        dob, // credit card format
        namePart.toLowerCase() + dob,
        name.toLowerCase().slice(0, 4) + dob,
      ];
    

    for (const password of candidates) { const result = await lekha.extract({ document: pdfBuffer, type: "bank_statement", password, }); if (result.success) return result.data as HDFCStatement; }

    throw new Error( "Could not unlock PDF — prompt the user to enter their statement password manually", ); }

    If all candidates fail, surface a clear error message and let the user type the password directly. Never silently swallow decryption failures.

    Validating the Extracted Data

    Extraction accuracy is high, but running a quick sanity check before you write to your database costs almost nothing and catches the rare edge case:

    function validateHDFCStatement(data: HDFCStatement): string[] {
      const errors: string[] = [];
    

    // 1. Balance reconciliation const computed = data.transactions.reduce( (bal, tx) => bal + (tx.credit ?? 0) - (tx.debit ?? 0), data.summary.opening_balance, ); if (Math.abs(computed - data.summary.closing_balance) > 1) { errors.push( Balance mismatch — computed ₹${computed}, reported ₹${data.summary.closing_balance}, ); }

    // 2. Chronological order const dates = data.transactions.map((tx) => new Date(tx.date).getTime()); if (!dates.every((d, i) => i === 0 || d >= dates[i - 1])) { errors.push("Transactions are not in chronological order"); }

    // 3. IFSC sanity if (!/^HDFC\d{7}$/.test(data.account.ifsc)) { errors.push(Unexpected IFSC format: ${data.account.ifsc}); }

    return errors; }

    const issues = validateHDFCStatement(statement); if (issues.length > 0) { console.warn("Statement validation warnings:", issues); }

    A balance mismatch greater than ₹1 usually means a transaction was missed — worth flagging to your support team or triggering a re-extraction.

    Processing Long Multi-Page Statements

    Active HDFC current accounts can generate 30–50 page statements with 500+ transactions. Lekha processes the entire document in a single call — no manual chunking. For very large statements where you want to show progress to users, use the streaming API:

    const stream = await lekha.extractStream({
      document: pdfBuffer,
      type: "bank_statement",
    });
    

    const transactions: HDFCStatement["transactions"] = [];

    for await (const chunk of stream) { if (chunk.type === "transaction") { transactions.push(chunk.data); // Update a progress bar or emit a server-sent event onProgress(transactions.length); } }

    See the streaming API docs for the full event schema.

    Common HDFC Parsing Mistakes

    Treating credit card statements as bank statements. HDFC credit card PDFs reverse the credit/debit convention — a payment you make appears as a "debit" from the card's perspective. Always check account.account_type and branch your logic accordingly. Ignoring UPI reference IDs. HDFC embeds UPI references in transaction descriptions (UPI/CR/123456789/GPAY). These are essential for deduplication when you reconcile against a payments provider. Extract and store them. Locale-aware number parsing. HDFC uses Indian number grouping: 1,45,000.00. If you post-process transaction amounts yourself, strip commas before parsing: parseFloat(str.replace(/,/g, '')). Skipping the opening balance row. The first row in most HDFC statements is an "Opening Balance" entry with no debit or credit amount. Naive table parsers drop it, breaking your balance reconciliation.

    Lekha handles all four cases correctly by default.

    Putting It Together: HDFC Loan Underwriting Agent

    Here's a production-ready agent that reads an HDFC statement and produces a loan eligibility assessment using Claude:

    import { Lekha } from "lekha-sdk";
    import Anthropic from "@anthropic-ai/sdk";
    

    const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY }); const anthropic = new Anthropic();

    async function underwriteLoan(pdfBuffer: Buffer, requestedAmount: number) { // Step 1: Extract statement const { data: stmt } = (await lekha.extract({ document: pdfBuffer, type: "bank_statement", })) as { data: HDFCStatement };

    // Step 2: Compute trailing-3-month metrics const cutoff = new Date(); cutoff.setMonth(cutoff.getMonth() - 3);

    const recent = stmt.transactions.filter((tx) => new Date(tx.date) >= cutoff);

    const credits = recent.reduce((s, tx) => s + (tx.credit ?? 0), 0); const debits = recent.reduce((s, tx) => s + (tx.debit ?? 0), 0); const avgMonthlyInflow = credits / 3; const netCashflow = (credits - debits) / 3;

    // Step 3: Ask Claude for an eligibility assessment const response = await anthropic.messages.create({ model: "claude-sonnet-4-6", max_tokens: 512, messages: [ { role: "user", content: Assess loan eligibility for a request of ₹${requestedAmount.toLocaleString("en-IN")}.

    Account type: ${stmt.account.account_type} Branch: ${stmt.account.branch} Statement period: ${stmt.period.from} → ${stmt.period.to} Avg monthly inflow (last 3 months): ₹${avgMonthlyInflow.toLocaleString("en-IN")} Net monthly cashflow: ₹${netCashflow.toLocaleString("en-IN")} Closing balance: ₹${stmt.summary.closing_balance.toLocaleString("en-IN")}

    Provide: eligible (yes/no), recommended max EMI, and a one-sentence rationale., }, ], });

    const assessment = response.content[0].type === "text" ? response.content[0].text : "";

    return { account: stmt.account, metrics: { avgMonthlyInflow, netCashflow }, closing_balance: stmt.summary.closing_balance, assessment, }; }

    Read more examples in the Lekha docs.

    FAQ

    Can Lekha parse HDFC credit card statements? Yes. Lekha auto-detects savings, current, and credit card formats. To skip auto-detection and save ~200 ms, pass hint: { bank: "hdfc", document_type: "credit_card_statement" } in your request. What if the statement covers multiple account holders? HDFC joint account statements list both holders in the header. Lekha returns both names in account.holder_name as a comma-separated string: "Rahul Sharma, Priya Sharma". How many transactions can Lekha extract? There is no limit. We've processed HDFC current account statements with 2,000+ transactions spanning multiple years. The streaming API is recommended for anything above ~500 transactions. Does Lekha store the PDF after extraction? No. Documents are processed entirely in memory and discarded immediately after the API response is sent — zero persistence, full DPDP compliance. See our privacy architecture for details.

    Ready to stop wrestling with HDFC PDFs? Sign up for a free Lekha API key and extract your first statement in under five minutes. Questions? Reach us at hi@lekhadev.com or open a thread in our Discord.