← Back to blog
·9 min read

SBI Bank Statement Parser: Extract Structured JSON with AI

Parse any State Bank of India statement—YONO, branch PDF, or credit card—into structured JSON with AI. Handles Hindi text, NEFT codes, and password-protected files.

sbi bank statementbank statement parserdocument extractionai agentindian fintechstructured datatypescript

State Bank of India is the largest bank in India — over 500 million account holders, 22,000+ branches, and a statement format that has frustrated developers for years. SBI PDFs come in at least four distinct layouts depending on whether they were generated from YONO, internet banking, a branch printer, or the SBI Card portal. Each layout has its own quirks: bilingual headers, cryptic NEFT reference codes, and 17-digit account numbers that break most parsers.

This guide shows how to reliably extract structured JSON from any SBI statement using Lekha's API, with TypeScript examples you can drop into your project today.

Why SBI Statements Are Hard to Parse with Traditional Tools

Traditional OCR (Tesseract, AWS Textract) achieves around 76% field-level accuracy on Indian bank statements. On SBI specifically, accuracy drops further because:

  • Bilingual content: SBI branch-generated statements often include Hindi column headers alongside English ones. OCR tools trained on Latin scripts misread Devanagari as garbage characters.
  • Inconsistent column widths: The YONO app generates narrow-column PDFs; branch-printed statements use wide columns. A rule-based extractor tuned for one breaks on the other.
  • Cryptic transaction descriptions: NEFT/UTIBR52026041612345/HDFC0001234 is a valid SBI transaction description. Extracting the beneficiary name, IFSC, and transfer date requires semantic understanding, not just regex.
  • 17-digit account numbers: SBI account numbers can be 11–17 digits. Many parsers truncate or misalign them.
  • Password-protected exports: YONO-generated statements are often password-protected with the customer's date of birth — a pattern that differs from HDFC and ICICI.
  • Vision AI models read the PDF as an image, understand context across columns, and handle all of these cases without format-specific rules.

    Quick Start: Parse an SBI Statement in 3 Lines

    import { LekhaClient } from "@lekha/sdk";
    

    const lekha = new LekhaClient({ apiKey: process.env.LEKHA_API_KEY });

    const result = await lekha.extract({ document: fs.readFileSync("sbi-statement.pdf"), type: "bank_statement", });

    console.log(result.data);

    That's it. Lekha auto-detects the bank, the statement format, and the period covered. The response is always the same structured schema regardless of whether you sent a YONO PDF, a branch statement, or an SBI Card export.

    What You Get Back

    {
      "bank": "State Bank of India",
      "account": {
        "number": "XXXX XXXX 1234",
        "type": "savings",
        "holder": "RAVI KUMAR SHARMA",
        "branch": "New Delhi Main Branch",
        "ifsc": "SBIN0000691",
        "currency": "INR"
      },
      "period": {
        "from": "2026-01-01",
        "to": "2026-03-31"
      },
      "summary": {
        "opening_balance": 42500.0,
        "closing_balance": 67830.5,
        "total_credits": 185000.0,
        "total_debits": 159669.5,
        "transaction_count": 48
      },
      "transactions": [
        {
          "date": "2026-01-03",
          "description": "NEFT from HDFC Bank - Salary Jan",
          "reference": "UTIBR52026010312345",
          "type": "credit",
          "amount": 75000.0,
          "balance": 117500.0,
          "category": "salary"
        },
        {
          "date": "2026-01-05",
          "description": "UPI/AMAZON PAY/9876543210",
          "reference": "334521009876",
          "type": "debit",
          "amount": 3499.0,
          "balance": 114001.0,
          "category": "shopping"
        }
      ]
    }
    

    All amounts are numbers, never strings. All dates are ISO 8601. All descriptions are human-readable — Lekha resolves NEFT reference codes to readable beneficiary names where the data is present in the PDF.

    Handling YONO Password-Protected PDFs

    SBI's YONO app locks exported statements with a password derived from the account holder's date of birth (DDMMYYYY format). Lekha handles decryption when you pass the password:

    const result = await lekha.extract({
      document: fs.readFileSync("sbi-yono-statement.pdf"),
      type: "bank_statement",
      password: "01011990", // DDMMYYYY — account holder's DOB
    });
    

    If you're building a user-facing product, prompt the user for their date of birth specifically for this step. Do not store the password — pass it directly to the API call and discard it. Lekha processes in memory and never persists documents (DPDP-compliant architecture).

    Parsing SBI Credit Card Statements

    SBI credit card statements come from a different entity — SBI Card — and have an entirely different PDF layout with reward points, minimum due, and statement date distinct from the payment due date.

    const result = await lekha.extract({
      document: fs.readFileSync("sbi-card-statement.pdf"),
      type: "credit_card_statement",
    });
    

    const { data } = result;

    console.log(data.summary.total_amount_due); // 23450.00 console.log(data.summary.minimum_amount_due); // 1173.00 console.log(data.summary.payment_due_date); // "2026-02-18" console.log(data.summary.credit_limit); // 150000.00 console.log(data.summary.reward_points); // 4820

    The credit card schema is separate from the bank account schema and includes fields specific to revolving credit: interest charges, cash advance fees, overlimit fees, and the full transaction history with merchant category codes.

    Building a Multi-Statement Aggregator

    A common use case is pulling three to six months of SBI statements to compute income, expenses, and recurring obligations — the core of any bank statement analysis for lending or budgeting apps.

    import { LekhaClient } from "@lekha/sdk";
    import * as fs from "fs";
    import * as path from "path";
    

    const lekha = new LekhaClient({ apiKey: process.env.LEKHA_API_KEY });

    async function aggregateSbiStatements( statementDir: string, dob: string, ): Promise { const files = fs .readdirSync(statementDir) .filter((f) => f.endsWith(".pdf")) .map((f) => path.join(statementDir, f));

    const extractions = await Promise.all( files.map((file) => lekha.extract({ document: fs.readFileSync(file), type: "bank_statement", password: dob, }), ), );

    // Deduplicate transactions across overlapping periods const seen = new Set(); const allTransactions = extractions .flatMap((e) => e.data.transactions) .filter((tx) => { const key = ${tx.date}:${tx.amount}:${tx.reference}; if (seen.has(key)) return false; seen.add(key); return true; }) .sort((a, b) => a.date.localeCompare(b.date));

    const credits = allTransactions.filter((tx) => tx.type === "credit"); const debits = allTransactions.filter((tx) => tx.type === "debit");

    const monthlyIncome = groupByMonth(credits); const monthlyExpenses = groupByMonth(debits);

    return { period: { from: allTransactions[0].date, to: allTransactions[allTransactions.length - 1].date, }, average_monthly_income: average(Object.values(monthlyIncome)), average_monthly_expenses: average(Object.values(monthlyExpenses)), recurring_credits: findRecurring(credits), transaction_count: allTransactions.length, }; }

    function groupByMonth(transactions: Transaction[]): Record { return transactions.reduce( (acc, tx) => { const month = tx.date.slice(0, 7); // "YYYY-MM" acc[month] = (acc[month] ?? 0) + tx.amount; return acc; }, {} as Record, ); }

    function average(values: number[]): number { return values.reduce((a, b) => a + b, 0) / values.length; }

    function findRecurring(credits: Transaction[]): Transaction[] { // Transactions with the same description appearing in 3+ different months const byDesc = new Map(); for (const tx of credits) { const key = tx.description.toLowerCase().trim(); if (!byDesc.has(key)) byDesc.set(key, []); byDesc.get(key)!.push(tx); }

    return Array.from(byDesc.values()) .filter((group) => { const months = new Set(group.map((tx) => tx.date.slice(0, 7))); return months.size >= 3; }) .flatMap((group) => group); }

    This pattern — parallel extraction, deduplication by reference, grouping by month — works for income verification in lending workflows. See the full lending agent walkthrough in Building a Lending Agent with India's Account Aggregator Network.

    SBI Statement Variants Lekha Handles

    | Format | Source | Special handling | | ------------------------ | ----------------- | ------------------------------ | | YONO savings/current | SBI mobile app | DOB password, narrow columns | | Internet banking PDF | onlinesbi.sbi | Wide columns, English-only | | Branch-printed statement | Passbook counter | Bilingual headers, scanned | | SBI Card statement | SBI Card portal | Separate schema, reward points | | Salary account statement | Corporate banking | Employer name in header | | NRI account (NRE/NRO) | SBI global | Currency fields, FEMA notes |

    All six variants return the same top-level JSON schema. Your downstream code doesn't need to branch on statement type.

    Error Handling in Production

    import { LekhaClient, LekhaError } from "@lekha/sdk";
    

    const lekha = new LekhaClient({ apiKey: process.env.LEKHA_API_KEY });

    async function extractSafely(buffer: Buffer, password?: string) { try { return await lekha.extract({ document: buffer, type: "bank_statement", password, }); } catch (err) { if (err instanceof LekhaError) { switch (err.code) { case "WRONG_PASSWORD": throw new Error("Incorrect DOB — ask the user to re-enter"); case "NOT_A_BANK_STATEMENT": throw new Error("Document is not a bank statement"); case "UNSUPPORTED_FORMAT": throw new Error("Statement format not recognised — contact support"); case "LOW_QUALITY_SCAN": throw new Error("Scanned image too blurry — request a digital PDF"); default: throw err; } } throw err; } }

    WRONG_PASSWORD and LOW_QUALITY_SCAN are the two most common errors on SBI statements specifically. Branch-printed statements that were scanned at below 150 DPI often trigger the quality check — prompt users to upload digital PDFs from YONO or internet banking where possible.

    Accuracy Numbers

    Across our test corpus of 2,400 SBI statements spanning 2019–2026:

  • Field-level accuracy: 97.2% (vs 71% for Tesseract on the same corpus)
  • Transaction count match: 99.1%
  • Amount accuracy: 99.8% (amounts are the easiest field; descriptions are harder)
  • NEFT/IMPS reference parse: 94.3% (depends on whether the reference is printed in full)
  • Accuracy is measured against manually verified ground truth. We re-run the benchmark monthly as SBI updates its PDF templates.

    FAQ

    Does Lekha work with SBI statements older than 2019? Yes. SBI standardised its PDF format significantly in 2018. Statements from 2015–2018 work with slightly lower accuracy (~93%) due to an older layout that omitted closing balances per transaction. Statements before 2015 are treated as scanned images and processed through our vision pipeline. Can I extract only specific date ranges from a multi-month PDF? Not in a single API call today — Lekha returns all transactions in the document. Filter the transactions array by date on your end. Multi-month SBI PDFs (up to 12 months) are common and fully supported. How do I handle SBI statements where the account number is partially masked? Lekha returns the account number exactly as printed. SBI masks the middle digits on some digital exports (XXXX XXXX 1234). Use the IFSC code and account holder name as the primary identifiers if you need to match statements across periods. What's the maximum file size Lekha accepts? 10 MB per document. A 12-month SBI statement with 600+ transactions is typically 1–3 MB. If you have larger files, contact us — we can enable chunked uploads for your account.

    Ready to parse your first SBI statement? Get an API key and run your first extraction in under two minutes at lekhadev.com/playground. The full API reference and SDK docs are at lekhadev.com/docs.

    If you're building a lending or budgeting product, sign up for a free account — the free tier includes 50 extractions per month, no credit card required.