Notes on ML engineering, data platforms, and the developer tools I build along the way.
60,000 probes across GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.0 Flash, Llama 3.3 70B, DeepSeek Chat, and Mistral Small. Real latency numbers from continuous monitoring.
Building tokentoll, an Infracost-style cost-impact tool for LLM API spend, in a single day. Architecture, model-name resolution, multi-pass constant propagation, and validation across twenty real codebases.