Kotak Mahindra Bank Statement Parser: Extract Structured Data with AI
Parse any Kotak Mahindra bank statement—811, savings, salary, or current—into structured JSON with AI. Handles net banking PDFs, multi-page exports, and all format variations.
Kotak Mahindra Bank is India's fourth-largest private sector bank and one of the most popular among tech-savvy users, largely because of its 811 digital savings account — the zero-balance account that can be opened in under 5 minutes. For developers building lending, KYC, or personal finance applications, Kotak statements show up constantly in document queues.
The problem: Kotak's PDF formats vary significantly depending on whether the statement comes from net banking, the mobile app, a branch request, or a salary account portal. A parser that works for one format silently fails on another.
This guide shows you how to reliably parse any Kotak Mahindra bank statement into structured JSON using Lekha — including the edge cases that trip up most approaches.
Why Kotak Statements Are Hard to Parse
Kotak issues statements in at least four distinct PDF layouts:
| Source | Format characteristics | | ----------------------------- | ----------------------------------------------------- | | Net banking portal | Tabular, multi-page, paginated balance | | Mobile app (811) | Condensed, single or dual column | | Branch-generated | Header-heavy, sometimes scanned | | Salary account (via employer) | Employer branding overlaid, alternate table structure |
Beyond layout, Kotak statements commonly include:
UPI/CR/426784531/PHONEPE/919812345678@yblTraditional OCR tools fail on all of these because they extract text but cannot understand the semantic structure. Lekha uses vision AI to read the statement the way a human would — understanding context, not just characters.
What You Can Extract from a Kotak Statement
A successful extraction returns a structured object with:
interface KotakStatementResult {
bank: "Kotak Mahindra Bank";
account: {
number: string; // masked, e.g. "XXXXXXXX4521"
holder: string;
type: string; // "Savings" | "811 Digital" | "Salary" | "Current"
ifsc: string;
branch: string;
};
period: {
from: string; // ISO 8601, e.g. "2025-04-01"
to: string;
};
summary: {
openingBalance: number;
closingBalance: number;
totalCredits: number;
totalDebits: number;
creditCount: number;
debitCount: number;
};
transactions: Array<{
date: string; // ISO 8601
narration: string;
chequeRef: string | null;
debit: number | null;
credit: number | null;
balance: number | null;
mode: string; // "UPI" | "NEFT" | "IMPS" | "ATM" | "NACH" | "SI" | "CHQ" | "INT"
}>;
}
Lekha automatically classifies each transaction mode from the narration text — so UPI/DR/21348976 becomes mode: "UPI" and NEFT/INWARD/CR/HDFC becomes mode: "NEFT", even when the source PDF has no dedicated mode column.
Quick Start: Parse a Kotak Statement
Install the Lekha SDK and get your API key from lekhadev.com:
npm install @lekhadev/sdk
or
bun add @lekhadev/sdk
Parse from a URL
import { Lekha } from "@lekhadev/sdk";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
const result = await lekha.extract({
url: "https://your-storage.com/kotak-statement.pdf",
documentType: "bank_statement",
});
if (result.success) {
const { account, summary, transactions } = result.data;
console.log(${account.holder} — ${account.type});
console.log(Period: ${summary.period.from} to ${summary.period.to});
console.log(
Closing balance: ₹${summary.closingBalance.toLocaleString("en-IN")},
);
console.log(Transactions: ${transactions.length});
}
Parse from a file upload
import { Lekha } from "@lekhadev/sdk";
import { readFileSync } from "fs";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
const fileBuffer = readFileSync("./kotak-statement.pdf");
const base64 = fileBuffer.toString("base64");
const result = await lekha.extract({
file: { data: base64, mimeType: "application/pdf" },
documentType: "bank_statement",
});
Lekha processes the document in memory — no file is ever written to disk — which is required for compliance with India's DPDP Act.
Handling Kotak-Specific Edge Cases
1. Classifying 811 vs Regular Savings Accounts
The 811 account has its own statement layout and different fee structures. Lekha extracts account.type automatically, so you can branch your business logic:
const result = await lekha.extract({
url: statementUrl,
documentType: "bank_statement",
});
if (result.success) {
const { type } = result.data.account;
if (type === "811 Digital") {
// Zero-balance account — no minimum balance charges expected
validateNoMinBalancePenalty(result.data.transactions);
} else if (type === "Salary") {
// Salary accounts — look for employer credit each month
const salaryCredits = result.data.transactions.filter(
(t) => t.mode === "NEFT" && t.credit !== null && t.credit > 10000,
);
console.log(Salary credits found: ${salaryCredits.length});
}
}
2. UPI Transaction Parsing
Kotak narrations for UPI are verbose and unstructured. Lekha normalises them into a clean narration field while preserving the UPI reference:
// Raw in PDF: "UPI/DR/426784531012/ZOMATO PAYMENTS/ZOMATO@ICICI/FOOD ORDER 04MAY"
// After extraction:
{
date: "2026-05-04",
narration: "ZOMATO PAYMENTS via UPI",
mode: "UPI",
debit: 347,
credit: null,
chequeRef: "426784531012",
balance: 18432.50
}
3. Multi-Page Statements with Mid-Month Downloads
When a customer downloads a 6-month statement mid-month, Kotak's net banking portal adds a partial-month header that confuses naive parsers. Lekha handles the de-duplication automatically and returns a single clean transaction list sorted chronologically.
4. Detecting Salary Deposits for Income Verification
A common use case is verifying regular salary inflow — used in lending decisions. Here's a complete income verification snippet:
interface SalaryVerification {
verified: boolean;
averageMonthlySalary: number;
salaryCredits: Array<{ date: string; amount: number; employer: string }>;
consistency: "high" | "medium" | "low";
}
async function verifySalaryFromKotak(
statementUrl: string,
): Promise {
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY });
const result = await lekha.extract({
url: statementUrl,
documentType: "bank_statement",
});
if (!result.success) throw new Error(result.error.message);
const { transactions, period } = result.data;
// Identify likely salary credits: large NEFT inflows around month start
const salaryCredits = transactions
.filter((t) => {
if (!t.credit || t.mode !== "NEFT") return false;
const day = new Date(t.date).getDate();
return t.credit > 15000 && day <= 10; // within first 10 days
})
.map((t) => ({
date: t.date,
amount: t.credit!,
employer: extractEmployerName(t.narration),
}));
const months = getMonthsBetween(period.from, period.to);
const consistency =
salaryCredits.length >= months * 0.9
? "high"
: salaryCredits.length >= months * 0.7
? "medium"
: "low";
const averageMonthlySalary =
salaryCredits.reduce((sum, c) => sum + c.amount, 0) /
Math.max(salaryCredits.length, 1);
return {
verified: salaryCredits.length > 0,
averageMonthlySalary,
salaryCredits,
consistency,
};
}
function extractEmployerName(narration: string): string {
// "NEFT/INWARD/CR-HDFC/INFOSYS BPO LTD/SAL MAY26" → "INFOSYS BPO LTD"
const match = narration.match(/CR[- ]+[A-Z]+\/([^/]+)\//);
return match ? match[1].trim() : "Unknown";
}
Try this interactively at lekhadev.com/playground.
Building a Complete Kotak Statement Analysis API
Here's a minimal Next.js API route that accepts a Kotak statement and returns a financial summary suitable for a lending or account aggregation use case:
// app/api/analyze-kotak/route.ts
import { NextRequest, NextResponse } from "next/server";
import { Lekha } from "@lekhadev/sdk";
const lekha = new Lekha({ apiKey: process.env.LEKHA_API_KEY! });
export async function POST(req: NextRequest) {
const formData = await req.formData();
const file = formData.get("statement") as File;
if (!file) {
return NextResponse.json({ error: "No file provided" }, { status: 400 });
}
const buffer = await file.arrayBuffer();
const base64 = Buffer.from(buffer).toString("base64");
const result = await lekha.extract({
file: { data: base64, mimeType: "application/pdf" },
documentType: "bank_statement",
});
if (!result.success) {
return NextResponse.json({ error: result.error.message }, { status: 422 });
}
const { account, summary, transactions } = result.data;
// Compute derived metrics
const upiSpend = transactions
.filter((t) => t.mode === "UPI" && t.debit)
.reduce((sum, t) => sum + (t.debit ?? 0), 0);
const inwardNEFT = transactions
.filter((t) => t.mode === "NEFT" && t.credit)
.reduce((sum, t) => sum + (t.credit ?? 0), 0);
const monthlyBurnRate = summary.totalDebits / 6; // assumes 6-month statement
return NextResponse.json({
account: {
holder: account.holder,
type: account.type,
masked: account.number,
},
summary: {
closingBalance: summary.closingBalance,
totalCredits: summary.totalCredits,
totalDebits: summary.totalDebits,
upiSpend,
inwardNEFT,
monthlyBurnRate,
transactionCount: transactions.length,
},
});
}
Accuracy and Confidence
Lekha returns a confidence score (0–1) alongside the extraction result. For Kotak statements, confidence typically reflects:
const result = await lekha.extract({
url: statementUrl,
documentType: "bank_statement",
});
if (result.success && result.data.confidence < 0.85) {
// Flag for manual review rather than automated processing
await flagForReview(statementUrl, result.data.confidence);
} else {
await processAutomatically(result.data);
}
See the full confidence schema in Lekha's documentation.
FAQ
Q: Can Lekha parse password-protected Kotak PDFs? Kotak statements are often password-protected with the customer's date of birth (DDMMYYYY format). You can decrypt the PDF client-side using a library likepdf-lib before sending to Lekha — or pass the password directly in the API request using the password field if your customers provide it.
Q: Does Lekha support all Kotak account types?
Yes — savings (regular and 811 Digital), salary, current, and NRI accounts are all supported. Credit card statements are also handled but return a different schema optimised for credit data.
Q: How does Lekha handle Kotak statements with Hindi text?
Some branch-generated statements include Hindi headers or annotations. Lekha's vision model is multilingual and handles Devanagari script natively without any configuration.
Q: What is the processing time for a 6-month Kotak statement?
Typical processing time is 3–8 seconds for a 6-month, 10–15 page PDF. For batch scenarios where you need to process many statements simultaneously, see our batch processing guide.
Kotak Mahindra statements are among the most common documents developers encounter when building Indian fintech products. With Lekha, parsing them accurately — across all account types and export formats — takes fewer than 10 lines of code.
Ready to integrate? Get your free API key and process your first Kotak statement at lekhadev.com. The free tier includes 50 extractions per month — no credit card required.Explore the full API reference and supported document types at lekhadev.com/docs, or try parsing a sample statement live at lekhadev.com/playground.