>_ Hi, my name is

Sonu Kumar.

I build intelligent software.

role: Python Developer_

Python engineer with 8 years shipping production systems across fintech and AI research — from LLM-powered annotator services and NLP pipelines to agentic-coding benchmarks on SWE-bench and Terminal-bench. I care deeply about reproducible, well-tested, container-first engineering.

  • 8years in Python
  • 7engineering roles
  • 3LLM families shipped
  • bug finding & resolving

About me

I'm a Python developer based in Bangalore with eight years of experience spanning the entire project life cycle — planning, evaluation, requirements, design, development, testing, and deployment. I'm equally at home debugging a production incident, refactoring a legacy service, or designing an evaluation harness for an agentic coding system.

At Morgan Stanley I work on wealth- and investment-management tech, shipping microservices for Fixed Income, hedging and portfolio data. Before that, I spent a year and a half at Wolters Kluwer building an annotator template on top of Flask, LangChain, and GPT-4 / GPT-4 32K — with custom SpaCy and fine-tuned BERT models powering named-entity recognition and POS tagging over real business datasets.

In parallel, I contribute to AI benchmarking — evaluating agentic coding systems on SWE-bench and Terminal-bench, generating structured trajectories and patches, and validating fixes through Docker-based oracle verification. It's the kind of quiet, detail-obsessed work that makes LLM evaluation actually trustworthy.

Currently open to roles that push the frontier of LLM systems, agentic evaluation, and reproducible ML infrastructure.

Expertise

A blend of applied LLM/NLP engineering, benchmarking discipline, and production backend work.

AI Benchmarking & Evaluation

Evaluating agentic coding systems across SWE-bench and Terminal-bench. Structured trajectories (trajectory.md, trajectory.jsonl), Docker-based oracle verification, and evaluation rubrics for OpenCode baselines.

  • SWE-bench
  • Terminal-bench
  • Claude Code
  • Cursor
  • SpecStory
  • Oracle verification

LLM Engineering

Production annotator template on top of Flask, LangChain, and Cookiecutter — shipping prompts for summarization (stuff / refine / map-reduce), QA, and translation using GPT-3.5 Turbo, GPT-4, and GPT-4 32K.

  • GPT-4 / 32K
  • LangChain
  • Prompt pipelines
  • Map-reduce
  • Cookiecutter

NLP Systems

Custom SpaCy NER models trained on domain data, fine-tuned BERT for POS tagging, and Recognizers-Text integrations for temporal/date extraction. Bug fixes and workflows for RDF-based semantic content enrichment.

  • SpaCy
  • BERT fine-tune
  • Recognizers-Text
  • Transformers
  • RDF / SCE

Backend & Microservices

Flask → FastAPI migrations, Kerberos → OAuth, near-real-time position polling from Aladdin into DB2, Muni/Corp bond order-submission apps, and HTTP-REST APIs for Timeseries ingestion.

  • FastAPI
  • Flask
  • Sanic
  • OAuth
  • DB2 / Postgres

Cloud & Containerization

Containerized, reproducible workflows with Docker. Cloud-native deployments across AWS (MQ, EC2, CloudFormation, S3) and Azure (Queue, Container Registry), integrated with Bitbucket and Git-based CI.

  • Docker
  • AWS
  • Azure
  • CloudFormation
  • Git / Bitbucket

Observability & Data

API and asset-health monitoring with Prometheus + Prometheus Pushgateway, dashboards in Grafana, log analysis in Kibana. Built reporting pipelines over Oracle/SQL/MySQL/PostgreSQL for risk and regulatory reporting.

  • Prometheus
  • Grafana
  • Kibana
  • Pandas / NumPy
  • SQL

Experience

A timeline of roles, from intern to manager, across fintech and AI research.

  1. AI Benchmarking Contributor (Freelance / Contract)

    SWE-bench · Terminal-bench · Claude Code · Cursor

    Evaluating and improving agentic coding systems across SWE-bench and Terminal-bench environments.

    • Generated structured trajectories (trajectory.md, trajectory.jsonl) and patches by solving real repository-level issues using Claude Code and Cursor with SpecStory tracking.
    • Performed OpenCode baselining — reviewing agent reasoning, validating fixes via Docker-based oracle verification, and refining evaluation rubrics.
    • Contributed to autonomous repo-review workflows enforcing structured iteration, correctness validation, and benchmark compliance.
    • Strengthened reliability of LLM-assisted software engineering pipelines through reproducible verification workflows and artifact-driven submissions.
  2. Manager — Python Developer @ Morgan Stanley

    Fixed Income · FastAPI · Aladdin · DB2

    Wealth and investment management technology — microservices for Fixed Income, interest-rate hedging, and currency repatriation.

    • Built an app to poll near real-time positions from Aladdin and persist into DB2.
    • Contributed to the Flask → FastAPI migration and Kerberos → OAuth authentication move.
    • Shipped apps that streamline order submission for Muni/Corp bonds with rich reference-data filters.
  3. Senior Product Software Engineer @ Wolters Kluwer

    Flask · LangChain · GPT-4 · SpaCy · BERT

    Annotator template on Flask/LangChain for semantic analysis, summarization, and extraction across diverse business domains.

    • Generic annotator framework with Cookiecutter + LangChain, prompts for stuff / refine / map-reduce summarization, QA, translation.
    • Context-aware summarization service built on GPT-3.5 Turbo, GPT-4, and GPT-4 32K.
    • Integrated the Recognizers-Text library for accurate date/temporal entity extraction.
    • Migrated an annotator service from Sanic to Flask for better performance and scalability.
    • Trained a custom SpaCy NER model and fine-tuned BERT for POS tagging with tailored datasets.
    • Fixed bugs in an RDF-based Semantic Content Enrichment app and built annotator workflows to compare extracted dates against RDF data.
  4. Software Engineer @ SymphonyAI

    Prometheus · Flask · Timeseries

    Monitoring utility modules with Prometheus for industrial asset performance — anomaly and error detection via counter and gauge metrics.

    • Built and maintained the Asset Template microservice that provisions machinery and sensor data.
    • Enhanced the Measurement microservice to assimilate sensor readings across diverse APIs.
    • Instituted status-code monitoring across APIs and modules to reduce call-failure rates.
    • Constructed an HTTP-REST API with Flask for Timeseries data intake.
  5. Senior Associate — Python Backend @ Macquarie

    QRM · Prometheus Pushgateway · Grafana

    Central services optimizing Transaction and Static Reference Data caches for Middle/Back Office — Risk, Finance, Operations.

    • Designed and sustained the QRM (Quantitative Risk Management) application, producing risk-exposure reports.
    • Deal and rate retrieval, CSV report generation, oversight via Prometheus Pushgateway + Grafana.
    • Automated and scheduled macros to email attached reports on a cadence.
    • Managed regulatory reporting applications for daily trades across all operational regions.
  6. Software Developer (C++/Python) @ ALTRAN

    RTCC · CAMEL/INAP · Diameter · CDR

    SIBs (Service Independent Building Block APIs) for Real-Time Charging Control (RTCC), an online charging system over IP Multimedia Systems core.

    • Created and analysed CDRs for accuracy; release prep using Pandas and NumPy.
    • Designed and tested apps that administered call logic and automated scenario testing.
    • Built GUIs for subscriber data provisioning and mass-provisioning via a text-oriented client.
    • Planned database-capacity expansion to keep the app scalable.
    • Maintained INAP, MAP1, MAP2, and Diameter protocol standards in the design.
    • Troubleshooting, bug reproduction, and bug fixing across the stack.
  7. Intern (C++/Python) @ ALTRAN

    Open Services Platform · Intelligent Networks

    Development, maintenance, and bug-fixing on Alcatel-Lucent (Nokia) Open Services Platform — the substrate for flexible deployment of Intelligent Networks services.

    • Designed, built, and maintained efficient Python and C++ code.
    • Extended and improved existing product modules.
    • Planned, developed, documented, tested, and deployed new modules.
    • Worked with basic DB systems — indexes and filters.

Technical stack

The tools I reach for most often — battle-tested across fintech, enterprise NLP, and AI research.

Languages

  • Python
  • C++
  • JavaScript
  • SQL
  • HTML5
  • CSS3

NLP & ML

  • GPT-3.5 / 4 / 4 32K
  • LangChain
  • SpaCy
  • BERT fine-tune
  • Transformers
  • Recognizers-Text
  • scikit-learn

Backend & Web

  • FastAPI
  • Flask
  • Sanic
  • Pytest
  • Paramiko
  • Prometheus client
  • smtplib
  • TkInter

Data & Libraries

  • pandas
  • NumPy
  • matplotlib
  • Cookiecutter

Cloud & Infra

  • Docker
  • AWS EC2 / S3 / MQ
  • CloudFormation
  • Azure Queue
  • Azure Container Registry

Observability

  • Prometheus
  • Prometheus Pushgateway
  • Grafana
  • Kibana

Databases

  • PostgreSQL
  • MySQL
  • Oracle
  • DB2

Tools & Collab

  • Git
  • Bitbucket
  • Jira
  • Atlassian Suite
  • PyCharm
  • Jupyter
  • VS Code
  • Cursor

Domains

  • BFSI
  • Fixed Income
  • Equities
  • Hedging
  • Private Investment
  • AI Benchmarking
  • Industrial IoT

Education

  • Bachelors of Technology — Computer Science & Engineering

    KIIT University

    8.01 / 10
  • 12th — Senior Secondary

    CBSE

    67.5%
  • 10th — Secondary

    CBSE

    9.6 / 10

Get in touch

I'm open to conversations about LLM systems, agentic evaluation, and reproducible ML infrastructure. The fastest way to reach me is email — I read everything.