Technical case studies on production ML/AI infrastructure, data platforms, observability, and LLM operations.
One engineer, eight repositories: a tested capacity/runtime library on an internal PyPI, Dagster ingestion, FastAPI services, dashboards, and monitoring. How a battery prediction problem became a full production platform.
Ten-plus merged PRs into llm-d, the Kubernetes inference routing layer maintained by Red Hat, IBM, and Google. Flow-control priority bands, CI hardening, and what production inference infrastructure actually looks like up close.
Normal software has CI gates, smoke tests, canaries, and SLOs. AI apps need the same discipline for eval quality, token cost, LLM/RAG behavior, observability, and rollback readiness.
Dashboards help you inspect LLM inference systems. They do not decide whether a new endpoint is safe to route traffic to. I built aipreflight's inference profile to turn external probes and Prometheus metrics into deployment verdicts.
I scanned OpenAI's cookbook for LLM API calls and estimated the monthly cost at 1,000 calls per site. Four gpt-5 call sites account for 68% of the total spend.
Hourly probes across 15 frontier models from OpenAI, Anthropic, Google, DeepSeek, and xAI via OpenRouter. In this snapshot, median TTFT ranged from 321ms to 4,226ms. Raw data included.
How I built an ELT data platform for 100k+ IoT devices: Dagster for orchestration, dbt for transforms, Sqitch for migrations, ArgoCD for GitOps deployment, and PII-safe extraction from five API shards.
How I built an evaluation pipeline for battery prediction algorithms serving 100k+ IoT devices: Dagster-orchestrated dataset creation from field data, human-in-the-loop review, isolated venv testing across algorithm versions, MLflow tracking, and fleet-wide rollout.
60,000 probes across GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.0 Flash, Llama 3.3 70B, DeepSeek Chat, and Mistral Small. Real latency numbers from continuous monitoring.
Building tokentoll, an Infracost-style cost-impact tool for LLM API spend, in a single day. Architecture, model-name resolution, multi-pass constant propagation, and validation across twenty real codebases.