A Mental Model of TPUs for Performance Engineering
A visual mental model for understanding TPU architecture and how it relates to ML workloads.
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
A visual mental model for understanding TPU architecture and how it relates to ML workloads.
Exploring whether language model agents can enhance the performance of other LLM agents through a meta-benchmark approach.
A walkthrough of the roofline model — compute vs memory bounds, arithmetic intensity, and how different kernels land on the plot — with two interactive widgets.