Blog

Technical case studies on production ML/AI infrastructure, data platforms, observability, and LLM operations.

| platform-engineering / python / data-engineering

Building a battery analytics platform for 100k+ IoT devices, alone

One engineer, eight repositories: a tested capacity/runtime library on an internal PyPI, Dagster ingestion, FastAPI services, dashboards, and monitoring. How a battery prediction problem became a full production platform.

Read post
| llm / infrastructure / kubernetes

What I learned contributing to llm-d, a production inference router

Ten-plus merged PRs into llm-d, the Kubernetes inference routing layer maintained by Red Hat, IBM, and Google. Flow-control priority bands, CI hardening, and what production inference infrastructure actually looks like up close.

Read post
| ai / llm / rag

The missing deployment gate for AI applications

Normal software has CI gates, smoke tests, canaries, and SLOs. AI apps need the same discipline for eval quality, token cost, LLM/RAG behavior, observability, and rollback readiness.

Read post
| llm / infrastructure / prometheus

Verdict, not dashboard: readiness gates for LLM inference deployments

Dashboards help you inspect LLM inference systems. They do not decide whether a new endpoint is safe to route traffic to. I built aipreflight's inference profile to turn external probes and Prometheus metrics into deployment verdicts.

Read post
| llm / python / devtools

OpenAI's own cookbook costs $1,884/month to run. One model swap fixes most of it.

I scanned OpenAI's cookbook for LLM API calls and estimated the monthly cost at 1,000 calls per site. Four gpt-5 call sites account for 68% of the total spend.

Read post
| llm / go / devops

TTFT varied 13x in my LLM provider benchmark snapshot

Hourly probes across 15 frontier models from OpenAI, Anthropic, Google, DeepSeek, and xAI via OpenRouter. In this snapshot, median TTFT ranged from 321ms to 4,226ms. Raw data included.

Read post
| data-engineering / kubernetes / python

Building a data platform with dbt, Dagster, and ArgoCD

How I built an ELT data platform for 100k+ IoT devices: Dagster for orchestration, dbt for transforms, Sqitch for migrations, ArgoCD for GitOps deployment, and PII-safe extraction from five API shards.

Read post
| mlops / python / data-engineering

Evaluating ML algorithms in production: from field data to fleet deployment

How I built an evaluation pipeline for battery prediction algorithms serving 100k+ IoT devices: Dagster-orchestrated dataset creation from field data, human-in-the-loop review, isolated venv testing across algorithm versions, MLflow tracking, and fleet-wide rollout.

Read post
| llm / go / devops

I monitored 6 LLM APIs for 7 days. Here's what I found.

60,000 probes across GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.0 Flash, Llama 3.3 70B, DeepSeek Chat, and Mistral Small. Real latency numbers from continuous monitoring.

Read post
| llm / python / devtools

How I built Infracost for LLM spend in a day

Building tokentoll, an Infracost-style cost-impact tool for LLM API spend, in a single day. Architecture, model-name resolution, multi-pass constant propagation, and validation across twenty real codebases.

Read post