Writing my own interpreter for Lox, Part 10 - Bonus!
This is the final part in writing my own interpreter for Lox in Python. Previously, we implemented inheritance, completing the full Lox language specification from Crafting Interpreters. In this post, we add some bonus features inspired by the challenges in the book - small but meaningful improvements that make the language more practical.
After implementing the complete Lox spec, I wanted to flex the interpreter a bit and try out some extensions. These aren't totally random features - I gave them some thought and feel they would make nice additions:
- Native
len()function: A practical utility for working with strings - Explicit initialization: Catching uninitialized variable bugs at runtime
- Break statement: Early exit from loops without awkward flag variables
Let's see how each one works and what it took to add them.
Native Function: len()
Every language needs basic string operations, and getting the length of a string is about as basic as it gets. We already have a clock() native function, so adding len() follows the same pattern.
The Implementation
I added a Len class in app/native_fns.py that implements the LoxCallable interface:
class Len(LoxCallable):
def arity(self):
return 1
def call(self, interpreter, arguments):
value = arguments[0]
if not isinstance(value, str):
token = Token(TokenType.IDENTIFIER, 0, "len", "null")
raise LoxRuntimeError(token, "Argument to len() must be a string.")
return float(len(value))
def __str__(self):
return "<native_fn_len>"
The function:
- Takes exactly one argument (arity of 1)
- Validates that the argument is a string at runtime
- Returns the length as a float (Lox only has one number type)
- Raises a clear error for non-string arguments
Then in app/interpreter.py, I registered it in the global environment:
def __init__(self):
self.had_runtime_error = False
self.globals = Environment()
self.environment = self.globals
self.globals.define("clock", Clock())
self.globals.define("len", Len()) # New!
Usage
print len("hello"); // 5.0
print len(""); // 0.0
print len("hello world"); // 11.0
var message = "Lox is fun!";
if (len(message) > 10) {
print "Long message";
}
Error handling works as expected:
len(42); // Runtime error: Argument to len() must be a string.
len(nil); // Runtime error: Argument to len() must be a string.
len(true); // Runtime error: Argument to len() must be a string.
That's it! Native functions can expose platform capabilities without reimplementing everything in the interpreter. len() is trivial, but the same pattern works for file I/O, network requests, or any other host functionality you want to expose. The cool thing is we didn't touch the scanner, parser, or AST at all. Native functions are just callable values sitting in the global environment.
Explicit Initialization
By default, Lox implicitly initializes variables to nil:
var x;
print x; // nil
This is convenient but hides bugs. If you forget to initialize a variable, your code runs but does the wrong thing. No error, no warning, just silent failures.
I changed this to make uninitialized variable access a runtime error.
The Sentinel Approach
The trick is distinguishing three states:
- Undefined: Variable doesn't exist (already errors)
- Declared but uninitialized: Variable exists but has no value
- Initialized: Variable has a value (including explicit
nil)
I used a sentinel object in app/environment.py:
class _Uninitialized:
"""Sentinel value to mark uninitialized variables."""
def __repr__(self):
return "<uninitialized>"
UNINITIALIZED = _Uninitialized()
This sentinel is a special marker that means "no value yet" - distinct from None (which represents Lox's nil).
Variable Declaration
When declaring a variable without an initializer, we now store the sentinel instead of None:
def visit_var_stmt(self, stmt: VarStmt):
value = UNINITIALIZED # Changed from None
if (stmt.initializer):
value = self.evaluate(stmt.initializer)
self.environment.define(stmt.name.lexeme, value)
Variable Access
When accessing a variable, we check for the sentinel and error if found:
def get(self, name: Token):
if name.lexeme in self.values:
value = self.values[name.lexeme]
if value is UNINITIALIZED:
raise LoxRuntimeError(name, f"Variable '{name.lexeme}' has not been initialized.")
return value
# ... continue with enclosing scope lookup
The same check happens in get_at() for local variables resolved at compile time.
New Behavior
var a;
print a; // Runtime error: Variable 'a' has not been initialized.
var b = nil;
print b; // nil (explicitly initialized)
var c;
c = "assigned";
print c; // "assigned" (assigned before access)
This catches a whole class of bugs:
var total; // Forgot to initialize
for (var i = 0; i < 10; i = i + 1) {
total = total + i; // Error! Can't use uninitialized 'total'
}
The error tells you exactly what's wrong. No more debugging mysterious nil values that propagate through your program.
Edge Cases
Global vs Local: The change applies equally to both. The sentinel lives in the environment, so scoping doesn't matter.
Class Names: When defining classes, we temporarily bind nil to allow methods to reference the class. This still works fine - we use None, not UNINITIALIZED.
Function Parameters: Parameters are initialized by the arguments passed when calling the function, so they're never uninitialized.
Native Functions: clock and len are defined with actual callable objects, not the sentinel.
The implementation is clean - a handful of changes to environment.py and interpreter.py, plus updating call sites that use get_at() to pass the token for error reporting.
Why not check for explicit initialization at compile time?
Since, the resolver does static analysis, can't we also check for explicit initialization of variables in this pass? But, whether the variable gets initialized often depends on runtime conditions:var x = someFunction(); // someFunction() could return anything!
So we probably need something way more complex and at this point of my compiler journey, I don't fully understand what it takes or even if it's feasible. So I dug a little bit, turns out it's called as the Definite Assignment Analysis and I hear that Rust and Java do this sort of analysis to check for access of uninitialized variables.
So, bottom-line:
- Compile time analysis would require sophisticated control flow analysis
- It would anyways reject some valid programs to be unsafe
- The runtime check catches all the real bugs anyway
Since this is a learning project, the runtime approach is justified as it's simpler, cleaner and still effective.
Break Statement
Loops without break are like functions without return: technically complete but practically annoying. Want to exit a search loop when you find something? You end up with flag variables and awkward logic:
// Without break - ugly
var found = false;
var i = 0;
while (i < 100 and !found) {
if (someCondition(i)) {
found = true;
} else {
i = i + 1;
}
}
// With break - clean
var i = 0;
while (i < 100) {
if (someCondition(i)) {
break;
}
i = i + 1;
}
Grammar Change
Adding break requires one new production:
breakStmt → "break" ";" ;
That's it. It's a statement, it's a keyword, it ends with a semicolon.
Parsing
I added BREAK to the token types in app/token.py:
TokenType = Enum(
'TokenType',
[
# ... other tokens
'VAR', 'WHILE', 'BREAK',
# ...
]
)
The scanner already handles keywords generically, so break just works.
In app/parser.py, I added a case to the statement parser:
def statement(self):
# ... other statement types
if (self.match(TokenType.BREAK)):
return self.break_statement()
# ...
def break_statement(self):
keyword = self.previous()
self.consume(TokenType.SEMICOLON, "Expect ';' after break.")
return BreakStmt(keyword)
And the AST node in app/stmt.py:
class BreakStmt(Stmt):
def __init__(self, keyword: Token):
self.keyword = keyword
def accept(self, visitor: StmtVisitor):
return visitor.visit_break_stmt(self)
Exception-Based Control Flow
Like return, I implemented break using exceptions. When you hit a break statement, it throws a Break exception that unwinds the stack until a loop catches it.
I created app/break_error.py:
class Break(RuntimeError):
def __init__(self):
pass
In app/interpreter.py, break statements just raise the exception:
def visit_break_stmt(self, stmt: BreakStmt):
raise Break()
And while loops catch it:
def visit_while_stmt(self, stmt: WhileStmt):
try:
while (self.is_truthy(self.evaluate(stmt.condition))):
self.execute(stmt.body)
except Break:
pass # Break out of loop
return None
That's it. The exception unwinds through nested blocks, if statements, whatever, until it hits the while loop's try-catch.
Since for loops desugar to while loops in our parser, they get break support for free.
Compile-Time Validation
You shouldn't be able to break outside a loop. That's a compile error in most languages, and we can catch it during static analysis.
I added a loop_depth counter to the resolver in app/resolver.py:
class Resolver(ExprVisitor, StmtVisitor):
def __init__(self, interpreter):
# ... other fields
self.loop_depth = 0 # Track nesting depth of loops
When resolving while statements, increment depth before resolving the body, then decrement after:
def visit_while_stmt(self, stmt):
self.resolve(stmt.condition)
self.loop_depth += 1
self.resolve(stmt.body)
self.loop_depth -= 1
And check the depth when resolving break statements:
def visit_break_stmt(self, stmt):
if self.loop_depth == 0:
ResolveError.error(self, stmt.keyword, "Can't use 'break' outside of a loop.")
This catches errors before execution:
break; // Error: Can't use 'break' outside of a loop.
fun foo() {
break; // Error: Can't use 'break' outside of a loop.
}
while (true) {
break; // OK!
}
Usage Examples
Infinite loops with escape hatches:
while (true) {
print "Enter loop";
if (shouldExit()) {
break;
}
print "Still looping";
}
print "Exited";
Search loops:
var found = nil;
var i = 0;
while (i < 100) {
if (matches(items[i])) {
found = items[i];
break;
}
i = i + 1;
}
Nested loops work correctly - break only exits the innermost loop:
var i = 0;
while (i < 10) {
var j = 0;
while (j < 10) {
if (i == j and i == 5) {
break; // Only breaks inner loop
}
j = j + 1;
}
i = i + 1;
}
Putting It Together
These three features might seem small, but they make Lox noticeably more pleasant to use:
len()demonstrates how easy it is to extend the language with native functions.Explicit initialization catches bugs that would otherwise be silent. The sentinel pattern is simple and required minimal changes to the existing code.
Break eliminates awkward loop patterns and makes intent clear. The exception-based implementation mirrors how
returnworks and the compile-time check prevents misuse.
None of these required major surgery on the interpreter. The architecture we built over the previous nine posts handles these extensions gracefully.
Reflections on Building an Interpreter
Building PyLox from scratch was a fun trip. I started with a scanner that just recognizes tokens, and ten posts later I've got classes, inheritance, closures, and compile-time error checking.
- The visitor pattern made it easy to add new operations (evaluation, resolution, pretty-printing if you wanted) without modifying the AST
- Environment chains gave us lexical scoping and closures almost for free
- Two-pass execution (resolver then interpreter) caught errors early and optimized variable lookup
- Exception-based control flow handled both errors and non-local jumps elegantly
The bonus features show how robust the foundation is. Want a new native function? Implement the interface and register it. Want better error checking? Add a pass to the resolver. Want a new control flow construct? Wire up an exception and catch it where appropriate.
This concludes the Crafting an Interpreter from Scratch series!