CrewAI Loan Eligibility Agent: Automate with Lekha
Build a multi-agent loan eligibility pipeline using CrewAI and Lekha to parse Indian bank statements and salary slips automatically.
Loan eligibility checks are one of the most document-heavy workflows in Indian fintech. A typical personal loan application involves a bank statement (3–6 months), a salary slip, and sometimes a Form 16 or ITR. Manually reviewing these is slow — and scaling that review with basic OCR is fragile.
This guide shows you how to build a fully automated loan eligibility agent using CrewAI and Lekha. The agent will extract structured data from uploaded documents, run eligibility logic, and return a decision — all without a human in the loop.
What You'll Build
A multi-agent CrewAI pipeline with three specialised agents:
Prerequisites
crewai, httpx, and python-dotenv installedpip install crewai httpx python-dotenv
Step 1: Lekha Tools for CrewAI
CrewAI agents need tools — Python callables that accept a string and return a string. Wrap Lekha's extraction endpoints as two tools: one for bank statements and one for salary slips.
# tools.py
import httpx
import base64
import json
import os
LEKHA_API_KEY = os.environ["LEKHA_API_KEY"]
LEKHA_BASE_URL = "https://api.lekhadev.com/v1"
def parse_bank_statement(file_path: str) -> str:
"""Extract structured JSON from an Indian bank statement PDF."""
with open(file_path, "rb") as f:
encoded = base64.b64encode(f.read()).decode()
response = httpx.post(
f"{LEKHA_BASE_URL}/extract",
headers={"Authorization": f"Bearer {LEKHA_API_KEY}"},
json={
"document": encoded,
"document_type": "bank_statement",
"mime_type": "application/pdf",
},
timeout=60,
)
response.raise_for_status()
data = response.json()
return json.dumps(data["data"], indent=2)
def parse_salary_slip(file_path: str) -> str:
"""Extract structured JSON from an Indian salary slip PDF."""
with open(file_path, "rb") as f:
encoded = base64.b64encode(f.read()).decode()
response = httpx.post(
f"{LEKHA_BASE_URL}/extract",
headers={"Authorization": f"Bearer {LEKHA_API_KEY}"},
json={
"document": encoded,
"document_type": "salary_slip",
"mime_type": "application/pdf",
},
timeout=60,
)
response.raise_for_status()
data = response.json()
return json.dumps(data["data"], indent=2)
Lekha handles the heavy lifting — it detects the bank format automatically (HDFC, ICICI, SBI, Kotak, Axis, and 20+ others), normalises amounts to numbers, and returns dates in ISO 8601. Your tools stay clean.
Step 2: Define the Agents
Each agent has a role, a goal, and a backstory. CrewAI uses these to guide the LLM's behaviour.
# agents.py
from crewai import Agent
from tools import parse_bank_statement, parse_salary_slip
document_analyst = Agent(
role="Document Analyst",
goal=(
"Extract complete and accurate structured data from financial documents "
"using the provided tools. Return the raw JSON without modification."
),
backstory=(
"You are an expert in Indian financial document formats. "
"You use Lekha to extract structured data and pass it to the next agent faithfully."
),
tools=[parse_bank_statement, parse_salary_slip],
verbose=True,
)
financial_analyst = Agent(
role="Financial Analyst",
goal=(
"Analyse extracted financial data to compute: average monthly inflow, "
"average monthly outflow, closing balance trend, EMI obligations visible "
"in the statement, and net monthly income from the salary slip."
),
backstory=(
"You are a credit analyst at an Indian NBFC. You understand the patterns "
"in bank statements that signal repayment capacity and financial stress."
),
verbose=True,
)
eligibility_decider = Agent(
role="Eligibility Decider",
goal=(
"Apply the lending policy and return a JSON decision with keys: "
"'decision' (approve | refer | reject), 'reason', 'eligible_amount', "
"and 'risk_flags'. Base the decision solely on the financial analysis provided."
),
backstory=(
"You are a senior credit officer. You apply the lender's policy rules "
"consistently and explain every decision in plain language."
),
verbose=True,
)
Step 3: Define the Tasks
Tasks connect agents to their inputs and expected outputs.
# tasks.py
from crewai import Task
from agents import document_analyst, financial_analyst, eligibility_decider
def build_tasks(bank_statement_path: str, salary_slip_path: str, loan_amount: int):
extract_task = Task(
description=(
f"Extract data from the bank statement at '{bank_statement_path}' "
f"and the salary slip at '{salary_slip_path}'. "
"Return both JSON payloads in a single response, clearly labelled."
),
expected_output="Two JSON blocks: one for the bank statement, one for the salary slip.",
agent=document_analyst,
)
analyse_task = Task(
description=(
"Using the extracted financial data, compute:\n"
"1. Average monthly credit (inflows) over the statement period\n"
"2. Average monthly debit (outflows)\n"
"3. Net monthly income from the salary slip\n"
"4. Any existing EMI deductions visible in the statement\n"
"5. Closing balance trend (improving / stable / declining)\n\n"
"Return a structured summary of these metrics."
),
expected_output="A structured financial summary with all five metrics.",
agent=financial_analyst,
context=[extract_task],
)
decide_task = Task(
description=(
f"The applicant has requested a loan of ₹{loan_amount:,}. "
"Apply the following policy:\n"
"- Approve if: net monthly income ≥ 3× requested EMI, "
"closing balance trend is stable or improving, and no salary bounces\n"
"- Refer if: income borderline (2–3×) or trend is flat with minor stress\n"
"- Reject if: income < 2× EMI, declining trend, or repeated bounces\n\n"
"Assume a 24-month tenure at 14% p.a. for EMI calculation. "
"Return your decision as JSON."
),
expected_output=(
'JSON with keys: decision, reason, eligible_amount, risk_flags. '
'Example: {"decision": "approve", "reason": "...", '
'"eligible_amount": 200000, "risk_flags": []}'
),
agent=eligibility_decider,
context=[analyse_task],
)
return [extract_task, analyse_task, decide_task]
Step 4: Assemble and Run the Crew
# main.py
import json
from crewai import Crew, Process
from agents import document_analyst, financial_analyst, eligibility_decider
from tasks import build_tasks
def run_loan_eligibility(
bank_statement_path: str,
salary_slip_path: str,
loan_amount: int,
) -> dict:
tasks = build_tasks(bank_statement_path, salary_slip_path, loan_amount)
crew = Crew(
agents=[document_analyst, financial_analyst, eligibility_decider],
tasks=tasks,
process=Process.sequential,
verbose=True,
)
result = crew.kickoff()
# The final task returns JSON — parse it out
raw = result.raw if hasattr(result, "raw") else str(result)
start = raw.find("{")
end = raw.rfind("}") + 1
return json.loads(raw[start:end])
if __name__ == "__main__":
decision = run_loan_eligibility(
bank_statement_path="samples/hdfc_statement.pdf",
salary_slip_path="samples/salary_slip_april.pdf",
loan_amount=300000,
)
print(json.dumps(decision, indent=2))
Running this produces output like:
{
"decision": "approve",
"reason": "Net monthly income of ₹82,400 is 4.1× the required EMI of ₹20,100. Closing balance trend is stable with no bounces in 6 months.",
"eligible_amount": 300000,
"risk_flags": []
}
Why Lekha Instead of Direct Vision API Calls?
You could call Claude or GPT-4o directly with the PDF. Here's what you trade away:
| | Direct Vision API | Lekha |
| ------------------------- | -------------------- | --------------------------- |
| Bank format normalisation | You build it | Built-in (30+ banks) |
| Date format | Varies by bank | Always ISO 8601 |
| Amount format | "₹1,45,000" strings | Always number |
| Multi-page statements | You handle chunking | Handled automatically |
| In-memory processing | Depends on your code | Guaranteed (DPDP compliant) |
| Latency | 8–15s per doc | 4–8s per doc |
For a production lending pipeline, those normalisations are months of edge-case handling. Lekha encapsulates them so your CrewAI agents can focus on the business logic.
Extending the Pipeline
Once the basic pipeline works, common extensions include:
Add a Form 16 or ITR agent — useful for self-employed applicants or annual income verification. Swap in Lekha'sitr or form_16 document type.
Parallel extraction — for high-volume applications, run parse_bank_statement and parse_salary_slip concurrently using CrewAI's Process.hierarchical mode with a manager LLM.
Webhook integration — wrap run_loan_eligibility in a FastAPI route and trigger it from your loan origination system via webhook when documents are uploaded.
Audit trail — store the intermediate task outputs alongside the final decision. Lekha never persists documents, so attach the extracted JSON (not the PDF) to your loan record for audit purposes.
FAQ
Can this pipeline handle scanned or photographed documents? Yes. Lekha accepts both native PDFs and scanned/image PDFs (JPEGs, PNGs). It uses vision AI to handle handwriting, stamps, and low-resolution scans common in older salary slips. Which banks does Lekha support for statement parsing? Lekha supports 30+ Indian banks including HDFC, ICICI, SBI, Axis, Kotak, PNB, Canara, Yes Bank, IndusInd, Federal Bank, and more. See the full list at lekhadev.com/docs. How do I handle applicants who submit statements from multiple accounts? Callparse_bank_statement once per PDF and aggregate the extracted transactions in the Financial Analyst's task. Pass all extracted JSONs as context so it can compute combined inflows across accounts.
Is this pipeline production-ready?
The core extraction and decision logic is solid. For production, add input validation (file size, MIME type checks), retry logic on Lekha API calls, and store decision audit logs. The Lekha playground lets you test extraction on real documents before deploying.
The complete working example is in the Lekha examples repository. Try it out with your own documents in the Lekha playground, or sign up for a free API key to start building today.