ICICI Bank Statement Parser: Extract Structured Data with AI
Parse any ICICI Bank statement — savings, current, or credit card — into structured JSON with AI. Handles all formats, net banking PDFs, and multi-page exports.
ICICI Bank is India's second-largest private bank with over 900 million account holders. If you are building a fintech app — loan underwriting, personal finance, expense analytics, or KYC — you will almost certainly encounter ICICI Bank statements from your users.
The problem: ICICI Bank generates different PDF layouts depending on whether the account is savings, current, salary, or credit card, and whether it was exported from net banking, the iMobile app, or sent by relationship managers. A naive OCR or regex parser breaks within weeks as ICICI refreshes its PDF templates.
This guide shows you how to extract structured JSON from any ICICI Bank statement using Lekha — a financial document intelligence API built specifically for Indian formats.
What Makes ICICI Bank Statements Hard to Parse
Before diving into the code, it helps to understand why ICICI statements trip up generic parsers:
Multiple layout variants. Savings account statements from net banking use a two-column layout with a running balance column. Current account statements (especially for businesses) add a cheque number column and MICR code. Credit card statements have an entirely different structure: billing cycle, minimum due, reward points. Password protection. ICICI net banking PDFs are often password-protected using the account holder's date of birth (DDMMYYYY) or a custom password. A parser that can't unlock the file returns nothing useful. Mixed transaction types. A single statement may contain NEFT, RTGS, UPI, ECS, IMPS, ATM, and standing instruction entries — each with slightly different narration formats. Extracting payee names from narrations likeUPI/123456789/FOOD/ZOMATO@OKICICI requires semantic understanding, not just regex.
Multi-page exports. Active accounts can generate 50+ page PDFs covering a full financial year. Page headers repeat, table rows split across pages, and running totals appear only at the end.
Vision AI handles all of these gracefully. Here's how to use it.
Quickstart: Parse an ICICI Statement in 30 Seconds
Install the Lekha SDK:
bun add lekha-sdk
or: npm install lekha-sdk
Then parse a statement:
import Lekha from "lekha-sdk";
import { readFileSync } from "fs";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
const pdfBuffer = readFileSync("icici-statement.pdf");
const result = await lekha.extract({
document: pdfBuffer,
mimeType: "application/pdf",
documentType: "bank_statement",
});
console.log(result.data);
That's it. Lekha auto-detects the bank, handles password unlocking (if you pass the password), and returns structured JSON. No template configuration, no regex maintenance.
The Full Response Structure
Here is what the extracted JSON looks like for an ICICI savings account statement:
{
"bank": "ICICI Bank",
"accountType": "Savings",
"accountNumber": "XXXXXXXX4521",
"accountHolder": "Priya Sharma",
"ifsc": "ICIC0001234",
"branch": "Koramangala, Bengaluru",
"currency": "INR",
"statementPeriod": {
"from": "2025-04-01",
"to": "2026-03-31"
},
"openingBalance": 42500.0,
"closingBalance": 187340.5,
"totalCredits": 2845000.0,
"totalDebits": 2700159.5,
"transactions": [
{
"date": "2025-04-03",
"narration": "UPI/345678901/RENT/LANDLORD@OKICICI",
"type": "debit",
"amount": 25000.0,
"balance": 17500.0,
"mode": "UPI",
"referenceNumber": "345678901",
"category": "rent"
},
{
"date": "2025-04-05",
"narration": "NEFT/AXISBANK/SALARY APR",
"type": "credit",
"amount": 85000.0,
"balance": 102500.0,
"mode": "NEFT",
"referenceNumber": "AXISBANK20250405",
"category": "salary"
}
]
}
Key things to notice:
YYYY-MM-DD), not DD/MM/YYYY"₹1,02,500"mode field is normalized across NEFT, RTGS, UPI, IMPS, ATM, ECScategory field is inferred from narration semantics — salary, rent, food, utilities, etc.Handling Password-Protected PDFs
Many users download ICICI statements from net banking, which are password-protected. Pass the password in the extraction call:
const result = await lekha.extract({
document: pdfBuffer,
mimeType: "application/pdf",
documentType: "bank_statement",
password: "01011990", // DDMMYYYY format is ICICI's default
});
If you don't know the password (common in B2C apps), you can let the user provide it through your UI, or use Lekha's password hint — ICICI's default pattern is the account holder's date of birth in DDMMYYYY format.
Credit Card Statements
ICICI credit card statements have a different schema. Pass documentType: "credit_card_statement" or let Lekha auto-detect:
const result = await lekha.extract({
document: pdfBuffer,
mimeType: "application/pdf",
// Lekha auto-detects credit card vs savings vs current
});
if (result.data.documentType === "credit_card_statement") {
const { billingCycle, minimumDue, totalDue, transactions } = result.data;
console.log(Minimum due: ₹${minimumDue} by ${billingCycle.dueDate});
}
The credit card response includes:
billingCycle.from, billingCycle.to, billingCycle.dueDatetotalDue, minimumDue, creditLimit, availableCreditrewardPoints.opening, rewardPoints.earned, rewardPoints.redeemed, rewardPoints.closingmerchant, category, emi (if an EMI transaction)Building a Cash Flow Analyzer
Here is a real-world use case: a cash flow analysis function that takes an ICICI statement and returns monthly income vs expense summaries — the kind of insight a lending agent or personal finance app needs.
import Lekha from "lekha-sdk";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
interface MonthlySummary {
month: string;
income: number;
expenses: number;
netCashFlow: number;
topExpenseCategory: string;
}
async function analyzeCashFlow(pdfBuffer: Buffer): Promise {
const result = await lekha.extract({
document: pdfBuffer,
mimeType: "application/pdf",
documentType: "bank_statement",
});
const { transactions } = result.data;
// Group transactions by month
const byMonth: Record = {};
for (const tx of transactions) {
const month = tx.date.slice(0, 7); // "YYYY-MM"
byMonth[month] = byMonth[month] ?? [];
byMonth[month].push(tx);
}
return Object.entries(byMonth).map(([month, txs]) => {
const income = txs
.filter((t) => t.type === "credit" && t.category !== "refund")
.reduce((sum, t) => sum + t.amount, 0);
const expenses = txs
.filter((t) => t.type === "debit")
.reduce((sum, t) => sum + t.amount, 0);
// Find top expense category
const categoryTotals: Record = {};
for (const tx of txs.filter((t) => t.type === "debit")) {
categoryTotals[tx.category] =
(categoryTotals[tx.category] ?? 0) + tx.amount;
}
const topExpenseCategory =
Object.entries(categoryTotals).sort(([, a], [, b]) => b - a)[0]?.[0] ??
"unknown";
return {
month,
income: Math.round(income * 100) / 100,
expenses: Math.round(expenses * 100) / 100,
netCashFlow: Math.round((income - expenses) * 100) / 100,
topExpenseCategory,
};
});
}
This works regardless of whether the statement spans 3 months or 24 months. Because Lekha normalizes all dates to ISO 8601 and amounts to numbers, the downstream logic stays clean.
Processing Multiple Statements in Parallel
When a user uploads 12 months of statements (one PDF per month, as ICICI net banking allows), process them in parallel:
async function extractAll(buffers: Buffer[]) {
const results = await Promise.all(
buffers.map((buf) =>
lekha.extract({
document: buf,
mimeType: "application/pdf",
documentType: "bank_statement",
}),
),
);
// Merge transactions across all statements and deduplicate
const seen = new Set();
const allTransactions = results
.flatMap((r) => r.data.transactions)
.filter((tx) => {
const key = ${tx.date}-${tx.amount}-${tx.narration};
if (seen.has(key)) return false;
seen.add(key);
return true;
})
.sort((a, b) => a.date.localeCompare(b.date));
return allTransactions;
}
Lekha processes each PDF in memory — no files are written to disk — so this is safe to run in a serverless environment like Vercel or AWS Lambda.
Accuracy on ICICI-Specific Quirks
Here is how Lekha handles the edge cases that break other parsers:
| Quirk | Generic OCR | Lekha |
| ------------------------------ | ----------------------- | ---------------------------- |
| Net banking PDF layout | ~80% accuracy | 99%+ |
| iMobile app PDF layout | Often fails | Supported |
| Password-protected PDF | Blocked | Pass password param |
| Credit card EMI entries | Misclassified | Correct with EMI flag |
| Reward points narrations | Treated as transactions | Filtered out |
| Cheque bounce / return entries | Missed | Captured with returnReason |
| Multi-page (50+ pages) | Truncated | Full extraction |
You can test any ICICI Bank statement in the Lekha Playground — no code required.
What to Build Next
Once you have structured JSON from ICICI statements, common next steps include:
All of these are much easier when the raw PDF is already normalized JSON.
FAQ
Does Lekha support all ICICI account types? Yes — savings, salary, current, NRE/NRO, and credit card statements are all supported. The response schema adapts to the account type automatically. How does Lekha handle statements with Hindi text? ICICI Bank statements are primarily in English, but some older formats include Hindi labels. Lekha's vision AI reads both scripts, so mixed-language PDFs work correctly. Can I extract statements from the ICICI iMobile app PDF? Yes. iMobile exports use a slightly different column order than net banking PDFs, but Lekha handles both. No configuration needed. Is processing ICICI Bank statements DPDP compliant? Lekha processes documents in memory only — nothing is persisted to disk or stored in a database. See our DPDP compliance guide for the full architecture.Ready to start extracting? Get your API key and run your first extraction in under 5 minutes at lekhadev.com. The free tier includes 50 extractions per month — no credit card required.