Sai Sasank Y

Writing my own interpreter for Lox, Part 7 - Resolving and Binding

This is the next part in writing my own interpreter for Lox in Python. Previously, we implemented functions with closures, enabling powerful functional programming patterns. In this post, we tackle a subtle but critical bug in our closure implementation through static analysis and variable resolution. Here is the corresponding chapter from Crafting Interpreters.

While our interpreter works correctly for most programs, there's a corner case with closures that produces incorrect behavior. To fix it, we'll introduce a new semantic analysis pass that walks the syntax tree before execution, resolving variable references to their declarations.

The Problem with Closures

Our current implementation has a subtle bug related to how closures capture variables. Consider this pathological (but valid) program:

var a = "global";
{
  fun showA() {
    print a;
  }

  showA();
  var a = "block";
  showA();
}

What should this print? According to lexical scoping rules, a in showA() should always refer to the same variable throughout the program's execution. Since the function is declared before the local a exists, it should reference the global a and print "global" twice.

But if we run this with our current interpreter, we get:

global
block

The same print a statement mysteriously prints different values! This violates the fundamental principle of static scoping: a variable expression should always resolve to the same declaration.

Why This Happens

The bug arises from how we look up variables. When we evaluate a variable expression, we walk the current environment chain at runtime. Here's what happens:

  1. When we first call showA(), we create a function body environment that chains back to the block scope, which chains to global scope. The lookup finds a in the global scope.

  2. After declaring the local a, the block environment now contains a binding for a.

  3. When we call showA() again, we create another function body environment chaining to the same block scope. This time, the lookup finds a in the block scope first.

The problem is that variable resolution is dynamic when it should be static. The meaning of a changes based on when the function executes, not where it was defined.

Static Scope and Semantic Analysis

Lox uses lexical scoping (also called static scoping), meaning we can determine which declaration a variable refers to just by reading the source code. The scope rules are:

A variable usage refers to the preceding declaration with the same name in the innermost scope that encloses the expression where the variable is used.

Key aspects of this rule:

To implement this correctly, we need to resolve variables before running the program. This is a form of semantic analysis - analyzing the meaning of the program beyond just syntax.

The Solution: A Resolver Pass

We'll add a new pass that runs between parsing and interpretation:

  1. Parse: Build the syntax tree (existing)
  2. Resolve: Walk the tree, computing which declaration each variable refers to (new)
  3. Interpret: Execute the program using the resolution information (updated)

The resolver will compute, for each variable expression, how many scopes away its declaration is. We'll store this information and use it during interpretation to jump directly to the right environment.

The Resolver Pass

The resolver is implemented in app/resolver.py as a visitor that walks the syntax tree. Unlike the interpreter which executes code, the resolver only analyzes it.

The Resolver Class

class Resolver(ExprVisitor, StmtVisitor):
    def __init__(self, interpreter):
        self.interpreter = interpreter
        self.scopes = deque()
        self.current_function = FunctionType.NONE
        self.had_error = False

The resolver maintains:

The boolean values in each scope dictionary indicate whether a variable's initializer has been resolved:

This distinction helps us catch errors like:

var a = a; // Error: Can't read variable in its own initializer

Scope Management

The resolver tracks scopes using a stack:

def begin_scope(self):
    self.scopes.append({})

def end_scope(self):
    self.scopes.pop()

As we traverse the tree, we push a new scope when entering a block and pop it when leaving. This mirrors the runtime environment chain but exists only during analysis.

Resolving Blocks

Block statements create new scopes:

def visit_block_stmt(self, stmt: BlockStmt):
    self.begin_scope()
    self.resolve(stmt.statements)
    self.end_scope()
    return None

The resolve() method is a polymorphic helper that can handle statements, expressions, or lists:

def resolve(self, stmt: list[Stmt] | Stmt | Expr):
    if isinstance(stmt, list):
        for st in stmt:
            self.resolve(st)
    elif isinstance(stmt, (Stmt, Expr)):
        stmt.accept(self)

Implementing Variable Resolution

Variable resolution happens in two phases: declaration and resolution.

Variable Declarations

When we encounter a variable declaration, we add it to the current scope:

def visit_var_stmt(self, stmt):
    self.declare(stmt.name)
    if (stmt.initializer):
        self.resolve(stmt.initializer)
    self.define(stmt.name)
    return None

The process is split into two steps:

def declare(self, token):
    if (len(self.scopes) == 0):
        return
    if token.lexeme in self.scopes[-1]:
        ResolveError.error(self, token, 
            "Already a variable with this name in this scope.")
    self.scopes[-1][token.lexeme] = False

def define(self, token):
    if (len(self.scopes) == 0):
        return
    self.scopes[-1][token.lexeme] = True

The two-step process allows us to:

  1. Declare: Mark the variable name as "declared but not ready" (False)
  2. Resolve the initializer (if present)
  3. Define: Mark the variable as "ready to use" (True)

This prevents code like var a = a; from referring to itself.

Variable References

When we encounter a variable expression, we resolve it:

def visit_var(self, expr):
    if (len(self.scopes) != 0) and self.scopes[-1].get(expr.name.lexeme) is False:
        ResolveError.error(self, expr.name, 
            "Can't read local variable in its own initializer.")

    self.resolve_local(expr, expr.name)
    return None

First, we check if we're trying to read a variable in its own initializer. Then we resolve it:

def resolve_local(self, expr, token):
    for i in range(len(self.scopes)-1, -1, -1):
        if token.lexeme in self.scopes[i]:
            self.interpreter.resolve(expr, len(self.scopes) - 1 - i)
            return

The resolve_local() method walks the scope stack from innermost to outermost. When it finds the variable, it tells the interpreter how many scopes away it is. The distance is len(self.scopes) - 1 - i:

If we don't find the variable in any local scope, we assume it's global (distance is not recorded).

Assignment Resolution

Assignments need resolution too:

def visit_assign(self, expr: Assign):
    self.resolve(expr.value)
    self.resolve_local(expr, expr.name)
    return None

We resolve the right-hand side expression first, then resolve where the variable is declared.

Function Resolution

Functions need special handling because they introduce both a new scope and potentially create closures:

def visit_function_stmt(self, stmt: FunctionStmt):
    self.declare(stmt.name)
    self.define(stmt.name)
    self.resolve_function(stmt, FunctionType.FUNCTION)
    return None

def resolve_function(self, fn: FunctionStmt, type: FunctionType):
    enclosing_function = self.current_function
    self.current_function = type
    self.begin_scope()
    for param in fn.params:
        self.declare(param)
        self.define(param)
    self.resolve(fn.body)
    self.end_scope()
    self.current_function = enclosing_function

Key points:

  1. We declare and define the function name before resolving its body. This allows recursive functions to reference themselves.
  2. We create a new scope for the function body
  3. We declare and define each parameter in that scope
  4. We resolve the function body
  5. We restore the previous function context (for handling nested functions)

Other Statements and Expressions

The resolver visits all statement and expression types, resolving any nested constructs:

def visit_if_stmt(self, stmt: IfStmt):
    self.resolve(stmt.condition)
    self.resolve(stmt.then_branch)
    if stmt.else_branch:
        self.resolve(stmt.else_branch)

def visit_while_stmt(self, stmt):
    self.resolve(stmt.condition)
    self.resolve(stmt.body)

def visit_binary(self, expr):
    self.resolve(expr.left)
    self.resolve(expr.right)

# ... and so on for all node types

For expressions like literals that don't contain variables, we do nothing:

def visit_literal(self, expr):
    return None

Updating the Interpreter

Now we need to update the interpreter to use the resolution information.

Storing Resolution Results

In app/interpreter.py, we add a dictionary to store the resolution data:

class Interpreter(ExprVisitor, StmtVisitor):
    def __init__(self):
        self.had_runtime_error = False
        self.globals = Environment()
        self.environment = self.globals
        self.globals.define("clock", Clock())
        self.locals = {}  # New: Maps expressions to scope distances

The locals dictionary maps expression nodes to integers representing how many scopes away their declarations are.

We add a method for the resolver to call:

def resolve(self, expr: Expr, depth: int):
    self.locals[expr] = depth

Looking Up Variables

We update variable lookup to use the resolution information:

def visit_var(self, expr: Variable):
    return self.look_up_variable(expr.name, expr)

def look_up_variable(self, token, expr):
    if expr not in self.locals:
        return self.globals.get(token)
    else:
        distance = self.locals[expr]
        return self.environment.get_at(distance, token.lexeme)

If we have resolution information, we use it. Otherwise, we treat it as a global variable (backward compatibility for expressions not passed through the resolver).

Environment Distance Access

We add new methods to app/environment.py for accessing variables at a specific distance:

def get_at(self, distance: int, name: str):
    return self.ancestor(distance).values[name]

def assign_at(self, distance: int, name: Token, value):
    self.ancestor(distance).values[name.lexeme] = value

def ancestor(self, distance: int):
    env = self
    for i in range(0, distance):
        env = env.enclosing
    return env

The ancestor() method walks up the environment chain a specific number of times. Then we access the variable directly in that environment's dictionary - no searching needed!

Updating Assignment

We update assignment to use resolved distances:

def visit_assign(self, expr: Assign):
    value = self.evaluate(expr.value)
    if expr in self.locals:
        distance = self.locals[expr]
        self.environment.assign_at(distance, expr.name, value)
    else:
        self.globals.assign(expr.name, value)
    return value

Running the Resolver

In app/main.py, we insert the resolver between parsing and interpretation:

case Command.run:
    parser = Parser(scanner.tokens)
    try:
        statements = parser.parse()
        if statements and not parser.had_error:
            interpreter = Interpreter()
            resolver = Resolver(interpreter)
            resolver.resolve(statements)
            if resolver.had_error:
                exit(65)
            value = interpreter.interpret(statements)
            if not interpreter.had_runtime_error:
                exit(0)

We run the resolver after parsing succeeds but before interpretation. If resolution fails, we exit without running the program.

Resolution Errors

The resolver can catch certain errors statically, before the program runs. This is a significant advantage - we can give users better error messages earlier.

Duplicate Variable Declaration

We can detect when a variable is declared twice in the same scope:

fun bad() {
  var a = "first";
  var a = "second";  // Error!
}

This is caught in the declare() method:

def declare(self, token):
    if (len(self.scopes) == 0):
        return
    if token.lexeme in self.scopes[-1]:
        ResolveError.error(self, token, 
            "Already a variable with this name in this scope.")
    self.scopes[-1][token.lexeme] = False

Note: We allow duplicate declarations in the global scope (when len(self.scopes) == 0) for REPL convenience.

Self-Reference in Initializer

We prevent variables from referencing themselves in their own initializer:

var a = a;  // Error: Can't read local variable in its own initializer

This is caught in visit_var():

def visit_var(self, expr):
    if (len(self.scopes) != 0) and self.scopes[-1].get(expr.name.lexeme) is False:
        ResolveError.error(self, expr.name, 
            "Can't read local variable in its own initializer.")
    self.resolve_local(expr, expr.name)
    return None

Top-Level Return

We can detect return statements outside of functions:

return "at top level";  // Error: Can't return from top-level code

We track the current function context using an enum and we will later add other function types such as class methods here:

FunctionType = Enum('FunctionType', ['NONE', 'FUNCTION'])

And check it when visiting return statements:

def visit_return_stmt(self, stmt):
    if self.current_function == FunctionType.NONE:
        ResolveError.error(self, stmt.keyword, 
            "Can't return from top-level code.")
    if stmt.value:
        self.resolve(stmt.value)

Error Reporting

Resolution errors are reported as compile-time errors using the existing error infrastructure:

class ResolveError(Exception):
    def __init__(self, msg):
        super().__init__(msg)

    @staticmethod
    def error(resolver, token: Token, msg: str):
        resolver.had_error = True
        compile_error(token, msg)
        return ResolveError(msg)

Conclusion

With the resolver in place, our interpreter now correctly implements lexical scoping even in tricky edge cases. The key insights are:

  1. Variable resolution is separate from execution.
  2. Precomputing scope distance is key.
  3. Static analysis catches bugs early.

The resolver demonstrates an important principle in language implementation: separating concerns. Instead of mixing variable resolution with execution, we handle them in separate passes. This makes the code cleaner and enables optimizations.

Our Lox interpreter now has three passes: scanning, parsing, and resolution before interpretation. The bug we fixed might seem obscure, but it represents a category of subtle semantic issues that can appear in language implementations. By adding static analysis, we've made our interpreter more correct and more robust. The resolver can also serve as a foundation for future optimizations and error checking.

#compilers #lox-interpreter #programming-languages #software-engineering