>_ Hi, my name is

Sonu Kumar.

I build intelligent software.

role: Python Developer_

Python engineer with 8 years shipping production systems across fintech and AI research from LLM-powered annotator services and NLP pipelines to agentic-coding benchmarks on SWE-bench and Terminal-bench. I care deeply about reproducible, well-tested, container-first engineering.

Get in touch Resume (PDF) My journey

8years in Python
7engineering roles
3LLM families shipped
∞bug finding & resolving

About me

I'm a Python developer based in Bangalore with eight years of experience spanning the entire project life cycle planning, evaluation, requirements, design, development, testing, and deployment. I'm equally at home debugging a production incident, refactoring a legacy service, or designing an evaluation harness for an agentic coding system.

At Morgan Stanley I work on wealth- and investment-management tech, shipping microservices for Fixed Income, hedging and portfolio data. Before that, I spent a year and a half at Wolters Kluwer building an annotator template on top of Flask, LangChain, and GPT-4 / GPT-4 32K with custom SpaCy and fine-tuned BERT models powering named-entity recognition and POS tagging over real business datasets.

In parallel, I contribute to AI benchmarking evaluating agentic coding systems on SWE-bench and Terminal-bench, generating structured trajectories and patches, and validating fixes through Docker-based oracle verification. It's the kind of quiet, detail-obsessed work that makes LLM evaluation actually trustworthy.

Currently open to roles that push the frontier of LLM systems, agentic evaluation, and reproducible ML infrastructure.

C:\> type about.json

{
  "name": "Sonu Kumar",
  "based_in": "Bangalore, IN",
  "role": "Python / NLP / LLM Engineer",
  "focus": [
    "LLM pipelines",
    "NLP systems",
    "agentic benchmarking",
    "backend microservices"
  ],
  "recent": {
    "benchmarks": ["SWE-bench", "Terminal-bench"],
    "tools": ["Claude Code", "Cursor", "Docker"]
  },
  "status": "open to new challenges"
}

Expertise

A blend of applied LLM/NLP engineering, benchmarking discipline, and production backend work.

AI Benchmarking & Evaluation

Evaluating agentic coding systems across SWE-bench and Terminal-bench. Structured trajectories (trajectory.md, trajectory.jsonl), Docker-based oracle verification, and evaluation rubrics for OpenCode baselines.

SWE-bench
Terminal-bench
Claude Code
Cursor
SpecStory
Oracle verification

LLM Engineering

Production annotator template on top of Flask, LangChain, and Cookiecutter shipping prompts for summarization (stuff / refine / map-reduce), QA, and translation using GPT-3.5 Turbo, GPT-4, and GPT-4 32K.

GPT-4 / 32K
LangChain
Prompt pipelines
Map-reduce
Cookiecutter

NLP Systems

Custom SpaCy NER models trained on domain data, fine-tuned BERT for POS tagging, and Recognizers-Text integrations for temporal/date extraction. Bug fixes and workflows for RDF-based semantic content enrichment.

SpaCy
BERT fine-tune
Recognizers-Text
Transformers
RDF / SCE

Backend & Microservices

Flask → FastAPI migrations, Kerberos → OAuth, near-real-time position polling from Aladdin into DB2, Muni/Corp bond order-submission apps, and HTTP-REST APIs for Timeseries ingestion.

FastAPI
Flask
Sanic
OAuth
DB2 / Postgres

Cloud & Containerization

Containerized, reproducible workflows with Docker. Cloud-native deployments across AWS (MQ, EC2, CloudFormation, S3) and Azure (Queue, Container Registry), integrated with Bitbucket and Git-based CI.

Docker
AWS
Azure
CloudFormation
Git / Bitbucket

Observability & Data

API and asset-health monitoring with Prometheus + Prometheus Pushgateway, dashboards in Grafana, log analysis in Kibana. Built reporting pipelines over Oracle/SQL/MySQL/PostgreSQL for risk and regulatory reporting.

Prometheus
Grafana
Kibana
Pandas / NumPy
SQL

Experience

A timeline of roles, from intern to manager, across fintech and AI research.

AI Benchmarking Contributor (Freelance / Contract)

SWE-bench · Terminal-bench · Claude Code · Cursor

2024 Present

Evaluating and improving agentic coding systems across SWE-bench and Terminal-bench environments.
- Generated structured trajectories (trajectory.md, trajectory.jsonl) and patches by solving real repository-level issues using Claude Code and Cursor with SpecStory tracking.
- Performed OpenCode baselining reviewing agent reasoning, validating fixes via Docker-based oracle verification, and refining evaluation rubrics.
- Contributed to autonomous repo-review workflows enforcing structured iteration, correctness validation, and benchmark compliance.
- Strengthened reliability of LLM-assisted software engineering pipelines through reproducible verification workflows and artifact-driven submissions.
Manager Python Developer @ Morgan Stanley

Fixed Income · FastAPI · Aladdin · DB2

Apr 2024 Present

Wealth and investment management technology microservices for Fixed Income, interest-rate hedging, and currency repatriation.
- Built an app to poll near real-time positions from Aladdin and persist into DB2.
- Contributed to the Flask → FastAPI migration and Kerberos → OAuth authentication move.
- Shipped apps that streamline order submission for Muni/Corp bonds with rich reference-data filters.
Senior Product Software Engineer @ Wolters Kluwer

Flask · LangChain · GPT-4 · SpaCy · BERT

Dec 2022 Apr 2024

Annotator template on Flask/LangChain for semantic analysis, summarization, and extraction across diverse business domains.
- Generic annotator framework with Cookiecutter + LangChain, prompts for stuff / refine / map-reduce summarization, QA, translation.
- Context-aware summarization service built on GPT-3.5 Turbo, GPT-4, and GPT-4 32K.
- Integrated the Recognizers-Text library for accurate date/temporal entity extraction.
- Migrated an annotator service from Sanic to Flask for better performance and scalability.
- Trained a custom SpaCy NER model and fine-tuned BERT for POS tagging with tailored datasets.
- Fixed bugs in an RDF-based Semantic Content Enrichment app and built annotator workflows to compare extracted dates against RDF data.
Software Engineer @ SymphonyAI

Prometheus · Flask · Timeseries

May 2022 Dec 2022

Monitoring utility modules with Prometheus for industrial asset performance anomaly and error detection via counter and gauge metrics.
- Built and maintained the Asset Template microservice that provisions machinery and sensor data.
- Enhanced the Measurement microservice to assimilate sensor readings across diverse APIs.
- Instituted status-code monitoring across APIs and modules to reduce call-failure rates.
- Constructed an HTTP-REST API with Flask for Timeseries data intake.
Senior Associate Python Backend @ Macquarie

QRM · Prometheus Pushgateway · Grafana

Sep 2020 May 2022

Central services optimizing Transaction and Static Reference Data caches for Middle/Back Office Risk, Finance, Operations.
- Designed and sustained the QRM (Quantitative Risk Management) application, producing risk-exposure reports.
- Deal and rate retrieval, CSV report generation, oversight via Prometheus Pushgateway + Grafana.
- Automated and scheduled macros to email attached reports on a cadence.
- Managed regulatory reporting applications for daily trades across all operational regions.
Software Developer (C++/Python) @ ALTRAN

RTCC · CAMEL/INAP · Diameter · CDR

Oct 2018 Aug 2020

SIBs (Service Independent Building Block APIs) for Real-Time Charging Control (RTCC), an online charging system over IP Multimedia Systems core.
- Created and analysed CDRs for accuracy; release prep using Pandas and NumPy.
- Designed and tested apps that administered call logic and automated scenario testing.
- Built GUIs for subscriber data provisioning and mass-provisioning via a text-oriented client.
- Planned database-capacity expansion to keep the app scalable.
- Maintained INAP, MAP1, MAP2, and Diameter protocol standards in the design.
- Troubleshooting, bug reproduction, and bug fixing across the stack.
Intern (C++/Python) @ ALTRAN

Open Services Platform · Intelligent Networks

Jan 2018 Jun 2018

Development, maintenance, and bug-fixing on Alcatel-Lucent (Nokia) Open Services Platform the substrate for flexible deployment of Intelligent Networks services.
- Designed, built, and maintained efficient Python and C++ code.
- Extended and improved existing product modules.
- Planned, developed, documented, tested, and deployed new modules.
- Worked with basic DB systems indexes and filters.

Technical stack

The tools I reach for most often battle-tested across fintech, enterprise NLP, and AI research.

Languages

Python
C++
JavaScript
SQL
HTML5
CSS3

NLP & ML

GPT-3.5 / 4 / 4 32K
LangChain
SpaCy
BERT fine-tune
Transformers
Recognizers-Text
scikit-learn

Backend & Web

FastAPI
Flask
Sanic
Pytest
Paramiko
Prometheus client
smtplib
TkInter

Data & Libraries

pandas
NumPy
matplotlib
Cookiecutter

Cloud & Infra

Docker
AWS EC2 / S3 / MQ
CloudFormation
Azure Queue
Azure Container Registry

Observability

Prometheus
Prometheus Pushgateway
Grafana
Kibana

Databases

PostgreSQL
MySQL
Oracle
DB2

Tools & Collab

Git
Bitbucket
Jira
Atlassian Suite
PyCharm
Jupyter
VS Code
Cursor

Domains

BFSI
Fixed Income
Equities
Hedging
Private Investment
AI Benchmarking
Industrial IoT

Education

Bachelors of Technology Computer Science & Engineering

KIIT University

8.01 / 10 2018
12^th Senior Secondary

CBSE

67.5% 2014
10^th Secondary

CBSE

9.6 / 10 2011

Get in touch

I'm open to conversations about LLM systems, agentic evaluation, and reproducible ML infrastructure. The fastest way to reach me is email I read everything.

sonuk.kumar@yahoo.com