Sai Sasank Y

Featured deep-diveselected · Feb 08, 2026

Feb 08, 202612 min read

Training a 360M Parameter Model with Performance Discipline

Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.

Read the post →Browse all

Recent writing17 posts · sorted by date

↳ The archive, latest first

All 17 compilers 10 AI systems performance 4 hardware 3 GPUs 2

Machine Baseline for CPU Performance Engineering on an M4 Pro
Establishing single-core FP32 compute, DRAM bandwidth, and cache hierarchy ceilings on Apple M4 Pro as denominators for kernel optimization.
AI systems performance
Apr 19, 20267 min
Extending Lox Language with Custom Features!
Extending Lox beyond the original spec with len(), explicit initialization checks, and break statements.
compilers
Dec 13, 20256 min
Writing my own interpreter for Lox, Part 9 - Inheritance
Adding single inheritance with method overriding and super keyword support to complete the Lox class system.
compilers
Oct 25, 20255 min
Writing my own interpreter for Lox, Part 8 - Classes
Implementing object-oriented programming in Lox with classes, instances, methods, and the this keyword.
compilers
Oct 18, 20255 min
Writing my own interpreter for Lox, Part 7 - Resolving and Binding
Fixing closure scoping bugs with a static resolver pass that computes variable binding distances.
compilers
Oct 05, 20257 min
Writing my own interpreter for Lox, Part 6 - Functions
Adding functions as first-class values, closures, and native functions to the Lox interpreter.
compilers
Sep 03, 20255 min
Writing my own interpreter for Lox, Part 5 - Control Flow
Adding control flow structures that make Lox Turing complete.
compilers
Aug 17, 20257 min
Writing my own interpreter for Lox, Part 4 - Statements
Extending the Lox interpreter with statements, variables, environments, and lexical scoping.
compilers
Aug 16, 20253 min
Writing my own interpreter for Lox, Part 3 - Evaluating Expressions
Implementing expression evaluation by converting AST nodes into runtime values in the Lox interpreter.
compilers
Aug 03, 20252 min

Browse the full archive →

Training a 360M Parameter Model with Performance Discipline

↳ The archive, latest first

Machine Baseline for CPU Performance Engineering on an M4 Pro

Extending Lox Language with Custom Features!

Writing my own interpreter for Lox, Part 9 - Inheritance

Writing my own interpreter for Lox, Part 8 - Classes

Writing my own interpreter for Lox, Part 7 - Resolving and Binding

Writing my own interpreter for Lox, Part 6 - Functions

Writing my own interpreter for Lox, Part 5 - Control Flow

Writing my own interpreter for Lox, Part 4 - Statements

Writing my own interpreter for Lox, Part 3 - Evaluating Expressions