Sai Sasank Y

Writing my own interpreter for Lox, Part 10 - Bonus!

This is the final part in writing my own interpreter for Lox in Python. Previously, we implemented inheritance, completing the full Lox language specification from Crafting Interpreters. In this post, we add some bonus features inspired by the challenges in the book - small but meaningful improvements that make the language more practical.

After implementing the complete Lox spec, I wanted to flex the interpreter a bit and try out some extensions. These aren't totally random features - I gave them some thought and feel they would make nice additions:

Let's see how each one works and what it took to add them.

Native Function: len()

Every language needs basic string operations, and getting the length of a string is about as basic as it gets. We already have a clock() native function, so adding len() follows the same pattern.

The Implementation

I added a Len class in app/native_fns.py that implements the LoxCallable interface:

class Len(LoxCallable):
    def arity(self):
        return 1

    def call(self, interpreter, arguments):
        value = arguments[0]
        if not isinstance(value, str):
            token = Token(TokenType.IDENTIFIER, 0, "len", "null")
            raise LoxRuntimeError(token, "Argument to len() must be a string.")
        return float(len(value))

    def __str__(self):
        return "<native_fn_len>"

The function:

Then in app/interpreter.py, I registered it in the global environment:

def __init__(self):
    self.had_runtime_error = False
    self.globals = Environment()
    self.environment = self.globals
    self.globals.define("clock", Clock())
    self.globals.define("len", Len())  # New!

Usage

print len("hello");         // 5.0
print len("");              // 0.0
print len("hello world");   // 11.0

var message = "Lox is fun!";
if (len(message) > 10) {
    print "Long message";
}

Error handling works as expected:

len(42);      // Runtime error: Argument to len() must be a string.
len(nil);     // Runtime error: Argument to len() must be a string.
len(true);    // Runtime error: Argument to len() must be a string.

That's it! Native functions can expose platform capabilities without reimplementing everything in the interpreter. len() is trivial, but the same pattern works for file I/O, network requests, or any other host functionality you want to expose. The cool thing is we didn't touch the scanner, parser, or AST at all. Native functions are just callable values sitting in the global environment.

Explicit Initialization

By default, Lox implicitly initializes variables to nil:

var x;
print x;  // nil

This is convenient but hides bugs. If you forget to initialize a variable, your code runs but does the wrong thing. No error, no warning, just silent failures.

I changed this to make uninitialized variable access a runtime error.

The Sentinel Approach

The trick is distinguishing three states:

  1. Undefined: Variable doesn't exist (already errors)
  2. Declared but uninitialized: Variable exists but has no value
  3. Initialized: Variable has a value (including explicit nil)

I used a sentinel object in app/environment.py:

class _Uninitialized:
    """Sentinel value to mark uninitialized variables."""
    def __repr__(self):
        return "<uninitialized>"

UNINITIALIZED = _Uninitialized()

This sentinel is a special marker that means "no value yet" - distinct from None (which represents Lox's nil).

Variable Declaration

When declaring a variable without an initializer, we now store the sentinel instead of None:

def visit_var_stmt(self, stmt: VarStmt):
    value = UNINITIALIZED  # Changed from None
    if (stmt.initializer):
        value = self.evaluate(stmt.initializer)
    self.environment.define(stmt.name.lexeme, value)

Variable Access

When accessing a variable, we check for the sentinel and error if found:

def get(self, name: Token):
    if name.lexeme in self.values:
        value = self.values[name.lexeme]
        if value is UNINITIALIZED:
            raise LoxRuntimeError(name, f"Variable '{name.lexeme}' has not been initialized.")
        return value
    # ... continue with enclosing scope lookup

The same check happens in get_at() for local variables resolved at compile time.

New Behavior

var a;
print a;  // Runtime error: Variable 'a' has not been initialized.

var b = nil;
print b;  // nil (explicitly initialized)

var c;
c = "assigned";
print c;  // "assigned" (assigned before access)

This catches a whole class of bugs:

var total;  // Forgot to initialize
for (var i = 0; i < 10; i = i + 1) {
    total = total + i;  // Error! Can't use uninitialized 'total'
}

The error tells you exactly what's wrong. No more debugging mysterious nil values that propagate through your program.

Edge Cases

Global vs Local: The change applies equally to both. The sentinel lives in the environment, so scoping doesn't matter.

Class Names: When defining classes, we temporarily bind nil to allow methods to reference the class. This still works fine - we use None, not UNINITIALIZED.

Function Parameters: Parameters are initialized by the arguments passed when calling the function, so they're never uninitialized.

Native Functions: clock and len are defined with actual callable objects, not the sentinel.

The implementation is clean - a handful of changes to environment.py and interpreter.py, plus updating call sites that use get_at() to pass the token for error reporting.

Why not check for explicit initialization at compile time? Since, the resolver does static analysis, can't we also check for explicit initialization of variables in this pass? But, whether the variable gets initialized often depends on runtime conditions:
var x = someFunction(); // someFunction() could return anything!

So we probably need something way more complex and at this point of my compiler journey, I don't fully understand what it takes or even if it's feasible. So I dug a little bit, turns out it's called as the Definite Assignment Analysis and I hear that Rust and Java do this sort of analysis to check for access of uninitialized variables.

So, bottom-line:

  1. Compile time analysis would require sophisticated control flow analysis
  2. It would anyways reject some valid programs to be unsafe
  3. The runtime check catches all the real bugs anyway

Since this is a learning project, the runtime approach is justified as it's simpler, cleaner and still effective.

Break Statement

Loops without break are like functions without return: technically complete but practically annoying. Want to exit a search loop when you find something? You end up with flag variables and awkward logic:

// Without break - ugly
var found = false;
var i = 0;
while (i < 100 and !found) {
    if (someCondition(i)) {
        found = true;
    } else {
        i = i + 1;
    }
}

// With break - clean
var i = 0;
while (i < 100) {
    if (someCondition(i)) {
        break;
    }
    i = i + 1;
}

Grammar Change

Adding break requires one new production:

breakStmt → "break" ";" ;

That's it. It's a statement, it's a keyword, it ends with a semicolon.

Parsing

I added BREAK to the token types in app/token.py:

TokenType = Enum(
    'TokenType',
    [
        # ... other tokens
        'VAR', 'WHILE', 'BREAK',
        # ...
    ]
)

The scanner already handles keywords generically, so break just works.

In app/parser.py, I added a case to the statement parser:

def statement(self):
    # ... other statement types
    if (self.match(TokenType.BREAK)):
        return self.break_statement()
    # ...

def break_statement(self):
    keyword = self.previous()
    self.consume(TokenType.SEMICOLON, "Expect ';' after break.")
    return BreakStmt(keyword)

And the AST node in app/stmt.py:

class BreakStmt(Stmt):
    def __init__(self, keyword: Token):
        self.keyword = keyword

    def accept(self, visitor: StmtVisitor):
        return visitor.visit_break_stmt(self)

Exception-Based Control Flow

Like return, I implemented break using exceptions. When you hit a break statement, it throws a Break exception that unwinds the stack until a loop catches it.

I created app/break_error.py:

class Break(RuntimeError):
    def __init__(self):
        pass

In app/interpreter.py, break statements just raise the exception:

def visit_break_stmt(self, stmt: BreakStmt):
    raise Break()

And while loops catch it:

def visit_while_stmt(self, stmt: WhileStmt):
    try:
        while (self.is_truthy(self.evaluate(stmt.condition))):
            self.execute(stmt.body)
    except Break:
        pass  # Break out of loop
    return None

That's it. The exception unwinds through nested blocks, if statements, whatever, until it hits the while loop's try-catch.

Since for loops desugar to while loops in our parser, they get break support for free.

Compile-Time Validation

You shouldn't be able to break outside a loop. That's a compile error in most languages, and we can catch it during static analysis.

I added a loop_depth counter to the resolver in app/resolver.py:

class Resolver(ExprVisitor, StmtVisitor):
    def __init__(self, interpreter):
        # ... other fields
        self.loop_depth = 0  # Track nesting depth of loops

When resolving while statements, increment depth before resolving the body, then decrement after:

def visit_while_stmt(self, stmt):
    self.resolve(stmt.condition)
    self.loop_depth += 1
    self.resolve(stmt.body)
    self.loop_depth -= 1

And check the depth when resolving break statements:

def visit_break_stmt(self, stmt):
    if self.loop_depth == 0:
        ResolveError.error(self, stmt.keyword, "Can't use 'break' outside of a loop.")

This catches errors before execution:

break;  // Error: Can't use 'break' outside of a loop.

fun foo() {
    break;  // Error: Can't use 'break' outside of a loop.
}

while (true) {
    break;  // OK!
}

Usage Examples

Infinite loops with escape hatches:

while (true) {
    print "Enter loop";
    if (shouldExit()) {
        break;
    }
    print "Still looping";
}
print "Exited";

Search loops:

var found = nil;
var i = 0;
while (i < 100) {
    if (matches(items[i])) {
        found = items[i];
        break;
    }
    i = i + 1;
}

Nested loops work correctly - break only exits the innermost loop:

var i = 0;
while (i < 10) {
    var j = 0;
    while (j < 10) {
        if (i == j and i == 5) {
            break;  // Only breaks inner loop
        }
        j = j + 1;
    }
    i = i + 1;
}

Putting It Together

These three features might seem small, but they make Lox noticeably more pleasant to use:

None of these required major surgery on the interpreter. The architecture we built over the previous nine posts handles these extensions gracefully.

Reflections on Building an Interpreter

Building PyLox from scratch was a fun trip. I started with a scanner that just recognizes tokens, and ten posts later I've got classes, inheritance, closures, and compile-time error checking.

The bonus features show how robust the foundation is. Want a new native function? Implement the interface and register it. Want better error checking? Add a pass to the resolver. Want a new control flow construct? Wire up an exception and catch it where appropriate.

This concludes the Crafting an Interpreter from Scratch series!

#compilers #lox-interpreter #programming-languages #software-engineering