Notes

Lessons learned, written down.

Shorter than courses. No prerequisites, no order. Mistakes, realizations, frameworks, and things I didn't want to forget.

MistakeRealizationFrameworkTIL

FrameworkApr 2024

LLM-as-Judge: Making Models Evaluate Models

How I built a rubric-based evaluation framework at Amazon, calibrated scoring against human audits, and learned what actually makes LLM evaluation trustworthy.

→

FrameworkJul 2024

Multimodal Evaluation Pipelines

Ingesting images and HTML, extracting structured signals, and measuring quality across proprietary KPIs — what I learned building this at Amazon scale.

→

FrameworkOct 2024

Semantic Search at Scale: Brand Standardization

Using FAISS and embeddings to map 300K noisy brand strings to a canonical taxonomy — the decisions that mattered and the ones that didn't.

→

FrameworkJan 2025

Prompt Engineering That Holds in Production

What actually works when you're scoring 32K+ products, not just demo notebooks. The patterns that survived and the ones that fell apart.

→

RealizationMar 2025

Benchmarking GenAI: Beyond Vibes

Designing evaluation systems that give you launch confidence — not just high scores. The hard lessons from building this inside Amazon.

→

FrameworkApr 2026

Context Engineering Is the New Prompt Engineering

The shift from 'write better prompts' to 'design better context' — and why this reframe changes everything about how you build with LLMs.

→

RealizationApr 2026

Reasoning Models Aren't Always Worth the Cost

I ran the numbers on when o3 actually beats Claude Sonnet, and the answer surprised me. Spoiler: it's not most tasks.

→

Stay in the loop

New notes, straight to your inbox.

No cadence, no noise. Just a note when something is worth writing down.