Training a 360M Parameter Model with Performance Discipline
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Analyzing the dot product operation through the roofline model on NVIDIA H100 GPU hardware.
A visual mental model for understanding TPU architecture and how it relates to ML workloads.