AI systems performance

Training a 360M Parameter Model with Performance Discipline

Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.

Featured
AI systems performance

A walkthrough of the roofline model — compute vs memory bounds, arithmetic intensity, and how different kernels land on the plot — with two interactive widgets.

Recent