Training a 360M Parameter Model with Performance Discipline
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Establishing single-core FP32 compute, DRAM bandwidth, and cache hierarchy ceilings on Apple M4 Pro as denominators for kernel optimization.
Apr 19, 20267 minExtending Lox beyond the original spec with len(), explicit initialization checks, and break statements.
Dec 13, 20256 minAdding single inheritance with method overriding and super keyword support to complete the Lox class system.
Oct 25, 20255 minImplementing object-oriented programming in Lox with classes, instances, methods, and the this keyword.
Oct 18, 20255 minFixing closure scoping bugs with a static resolver pass that computes variable binding distances.
Oct 05, 20257 minAdding functions as first-class values, closures, and native functions to the Lox interpreter.
Sep 03, 20255 minAdding control flow structures that make Lox Turing complete.
Aug 17, 20257 minExtending the Lox interpreter with statements, variables, environments, and lexical scoping.
Aug 16, 20253 minImplementing expression evaluation by converting AST nodes into runtime values in the Lox interpreter.
Aug 03, 20252 min