2026-03-30·10 min read

How to Give Your AI Agent the Ability to Read Indian Financial Documents

Step-by-step tutorial: add financial document extraction to your AI agent using Claude tool_use, OpenAI function calling, LangChain, and Vercel AI SDK. Full TypeScript code examples included.

ai agent tool useclaude tool use tutorialopenai function callinglangchain toolsdocument extraction apifinancial document parservercel ai sdkindian fintech api

Your Agent Is Blind to Documents

You have an AI agent that can chat, reason, and call APIs. A user uploads a bank statement and asks "what's my average monthly spend?" Your agent stares at a base64 blob and shrugs.

This is the gap between a chatbot and a useful financial agent. The agent needs to _read_ the document — not just receive it. It needs structured data: transactions as arrays, amounts as numbers, dates as ISO strings, balances that reconcile.

This guide shows you how to close that gap. We will wire document extraction as a tool that your agent can call autonomously, using three popular frameworks: the Anthropic SDK (Claude tool_use), OpenAI function calling, and LangChain.

What "Tool Use" Means for Document Extraction

Modern LLMs support tool use (also called function calling). You define a tool with a name, description, and input schema. The model decides when to call the tool based on the conversation. Your code executes the tool and returns the result. The model continues reasoning with the structured output.

For document extraction, the flow looks like this:

User sends a message with an attached PDF: _"Analyze this bank statement"_

Your agent recognizes it needs to extract data from the document

Agent calls the extract_financial_document tool with the base64-encoded PDF

The tool returns structured JSON — account holder, transactions, balances, confidence scores

Agent reasons over the structured data and responds: _"Your average monthly spend is Rs 47,320. Rent is your largest category at Rs 18,000/month."_

The agent never tries to parse the PDF itself. It delegates to a specialized extraction tool and focuses on what it does best: reasoning.

Method 1: Claude Tool Use (Anthropic SDK)

Claude's tool_use is the most natural fit. You define the tool, pass it in the API call, and Claude decides when to invoke it.

Install

bun add @anthropic-ai/sdk @lekha-dev/sdk

Define the Tool and Run the Agent

import Anthropic from "@anthropic-ai/sdk";
import { Lekha } from "@lekha-dev/sdk";
import { lekhaToolDefinition } from "@lekha-dev/sdk/tool-definition";
import fs from "fs";
const anthropic = new Anthropic();
const lekha = new Lekha("lk_live_...");
// The document the user uploaded
const document = fs.readFileSync("hdfc-statement.pdf");
const documentBase64 = document.toString("base64");
// Start the agent loop
const messages: Anthropic.MessageParam[] = [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: "Here is my bank statement. What's my average monthly spend and top 3 expense categories?",
      },
    ],
  },
];
// First call — Claude will request the tool
let response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  tools: [lekhaToolDefinition],
  messages,
});
// Handle tool use
while (response.stop_reason === "tool_use") {
  const toolBlock = response.content.find(
    (b) => b.type === "tool_use",
  ) as Anthropic.ToolUseBlock;
// Execute the extraction
  const result = await lekha.extract({
    document: documentBase64,
    type: (toolBlock.input as { type?: string }).type ?? "auto",
  });
// Feed the result back to Claude
  messages.push({ role: "assistant", content: response.content });
  messages.push({
    role: "user",
    content: [
      {
        type: "tool_result",
        tool_use_id: toolBlock.id,
        content: JSON.stringify(result),
      },
    ],
  });
response = await anthropic.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 4096,
    tools: [lekhaToolDefinition],
    messages,
  });
}
// Final response with analysis
const textBlock = response.content.find((b) => b.type === "text");
console.log(textBlock?.text);

Claude sees the tool definition, understands it can extract financial documents, and calls it when the user provides one. The SDK handles the extraction and returns clean JSON that Claude can reason over.

What Claude Receives Back

The tool result is not raw text. It is structured data:

{
  "success": true,
  "data": {
    "document_type": "bank_statement",
    "account_holder": "Priya Sharma",
    "account_number": "XXXX4521",
    "bank": "HDFC Bank",
    "period": {
      "from": "2026-01-01",
      "to": "2026-01-31"
    },
    "summary": {
      "opening_balance": 145000,
      "closing_balance": 98320,
      "total_credits": 85000,
      "total_debits": 131680
    },
    "transactions": [
      {
        "date": "2026-01-02",
        "description": "NEFT-ACME CORP-SALARY",
        "amount": 85000,
        "type": "credit",
        "balance": 230000,
        "category": "salary",
        "confidence": 0.97
      },
      {
        "date": "2026-01-05",
        "description": "UPI-SWIGGY",
        "amount": 450,
        "type": "debit",
        "balance": 229550,
        "category": "food_and_dining",
        "confidence": 0.94
      }
    ]
  },
  "validation": {
    "balance_reconciled": true,
    "transaction_sum_matches": true
  }
}

Every amount is a number. Every date is ISO 8601. Every field has a confidence score. The agent can immediately compute averages, group by category, and flag anomalies — no parsing required.

Method 2: OpenAI Function Calling

The same pattern works with OpenAI's function calling. The tool schema is nearly identical.

import OpenAI from "openai";
import { Lekha } from "@lekha-dev/sdk";
import fs from "fs";
const openai = new OpenAI();
const lekha = new Lekha("lk_live_...");
const documentBase64 = fs.readFileSync("sbi-statement.pdf", "base64");
const tools: OpenAI.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "extract_financial_document",
      description:
        "Extract structured data from an Indian financial document (bank statement, CAS, salary slip, ITR, CIBIL, GST invoice, Form 16, Form 26AS, GST return).",
      parameters: {
        type: "object",
        properties: {
          document: {
            type: "string",
            description: "Base64-encoded document (PDF, PNG, JPEG)",
          },
          type: {
            type: "string",
            enum: [
              "auto",
              "bank_statement",
              "cas",
              "salary_slip",
              "itr",
              "cibil",
              "gst_invoice",
              "form_16",
              "form_26as",
              "gst_return",
            ],
            default: "auto",
          },
        },
        required: ["document"],
      },
    },
  },
];
const messages: OpenAI.ChatCompletionMessageParam[] = [
  {
    role: "user",
    content:
      "Analyze this bank statement and tell me if I can afford a Rs 25,000/month EMI.",
  },
];
let response = await openai.chat.completions.create({
  model: "gpt-4o",
  tools,
  messages,
});
// Handle tool call
const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
  const result = await lekha.extract({
    document: documentBase64,
    type: "auto",
  });
messages.push(response.choices[0].message);
  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: JSON.stringify(result),
  });
response = await openai.chat.completions.create({
    model: "gpt-4o",
    tools,
    messages,
  });
}
console.log(response.choices[0].message.content);

Same result: the agent receives structured JSON and reasons over it.

Method 3: LangChain (One Import)

If you are using LangChain, the SDK ships a ready-made tool:

import { ChatAnthropic } from "@langchain/anthropic";
import { AgentExecutor, createToolCallingAgent } from "langchain/agents";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { DynamicStructuredTool } from "@langchain/core/tools";
import { createLekhaLangChainTool } from "@lekha-dev/sdk/integrations/langchain";
import { z } from "zod";
const lekhaTool = createLekhaLangChainTool({ apiKey: "lk_live_..." });
const tool = new DynamicStructuredTool({
  name: lekhaTool.name,
  description: lekhaTool.description,
  schema: z.object({
    document: z.string(),
    type: z.string().optional(),
  }),
  func: lekhaTool.func,
});
const model = new ChatAnthropic({ model: "claude-sonnet-4-20250514" });
const prompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    "You are a financial analyst agent. Use the extract tool to read documents.",
  ],
  ["human", "{input}"],
  ["placeholder", "{agent_scratchpad}"],
]);
const agent = createToolCallingAgent({ llm: model, tools: [tool], prompt });
const executor = new AgentExecutor({ agent, tools: [tool] });
const result = await executor.invoke({
  input: "Read this salary slip and tell me my net take-home pay.",
});
console.log(result.output);

Method 4: Vercel AI SDK

For Next.js apps using the Vercel AI SDK:

import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { createLekhaVercelTool } from "@lekha-dev/sdk/integrations/vercel-ai";
const lekhaTool = createLekhaVercelTool({ apiKey: "lk_live_..." });
const { text } = await generateText({
  model: anthropic("claude-sonnet-4-20250514"),
  tools: {
    extract_financial_document: lekhaTool,
  },
  maxSteps: 3,
  prompt:
    "Here is my CIBIL report. What is my credit score and should I apply for a home loan?",
});
console.log(text);

Four lines to give your agent document intelligence.

Why Structured Data Matters More Than Raw Text

You might wonder: why not just OCR the document and pass the raw text to the LLM? Three reasons.

Accuracy. Raw OCR text from Indian documents is noisy. Mixed Hindi and English scripts, the Indian numbering system (1,23,456 vs 123,456), bank-specific column orderings — a general-purpose LLM will misinterpret amounts, swap debits for credits, or hallucinate transaction dates. Specialized extraction with validation catches these errors before they reach your agent. Reliability. Every field in the Lekha response includes a confidence score between 0 and 1. Your agent can make decisions based on trust levels:

// In your agent's reasoning
if (result.data.summary.closing_balance_confidence > 0.9) {
  // Act on the data automatically
} else {
  // Flag for human review
}

Without confidence scores, your agent has no way to know when it is working with unreliable data.

Validation. Lekha runs cross-field validation on every extraction: balance reconciliation, transaction sum checks, sequential date consistency. If the opening balance plus credits minus debits does not equal the closing balance, the response tells you. An agent working with raw OCR text has no such safety net.

The 10 Document Types Your Agent Can Read

Lekha supports every major Indian financial document type:

| Document | What Your Agent Gets | | -------------- | ------------------------------------------------------- | | Bank Statement | Transactions, balances, categories, account details | | CAS Report | Mutual fund holdings, NAVs, folios, valuations | | Salary Slip | Gross/net pay, all components, deductions, YTD | | ITR Form | Income details, 80C/80D/80G deductions, tax computation | | CIBIL Report | Credit score, account history, enquiries, DPD | | GST Invoice | Line items, CGST/SGST/IGST, HSN codes, totals | | Form 16 | Part A (employer TDS) + Part B (income computation) | | Form 26AS | TDS entries, advance tax, refunds, AIS data | | GST Return | GSTR-3B outward supplies, ITC, net tax liability | | Balance Sheet | Assets, liabilities, equity, P&L summary |

One tool definition. One API call. Your agent handles all ten.

Common Pitfalls

Do not send the raw PDF to the LLM. Claude and GPT-4o can process images, but they are general-purpose models. They will approximate amounts and miss the difference between 1,23,456 and 12,34,56. Use a specialized extractor, then let the LLM reason over clean data. Handle the async path for large documents. If a document is more than a few pages, use the async extraction endpoint instead:

// For large documents (multi-page bank statements)
const job = await fetch("https://lekhadev.com/api/v1/extract/async", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-api-key": "lk_live_...",
  },
  body: JSON.stringify({ document: documentBase64, type: "auto" }),
});
const { job_id } = await job.json();
// Poll for result
const result = await fetch(https://lekhadev.com/api/v1/jobs/${job_id}, {
  headers: { "x-api-key": "lk_live_..." },
});

Always check validation in the response. If balance_reconciled is false, the document may be partially corrupted or multi-statement. Your agent should surface this to the user rather than silently working with bad data.

Build Something

You now have four ways to give your AI agent document intelligence. Pick the one that matches your stack:

Raw Anthropic SDK — maximum control over the agent loop

OpenAI SDK — if you are in the OpenAI ecosystem

LangChain — if you want the framework to manage the agent loop

Vercel AI SDK — if you are building a Next.js app

Get an API key at lekhadev.com/login and start extracting. The free tier gives you 100 documents per month — enough to build and test your agent before going to production.