Writing my own interpreter for Lox, Part 7 - Resolving and Binding
This is the next part in writing my own interpreter for Lox in Python. Previously, we implemented functions with closures, enabling powerful functional programming patterns. In this post, we tackle a subtle but critical bug in our closure implementation through static analysis and variable resolution. Here is the corresponding chapter from Crafting Interpreters.
While our interpreter works correctly for most programs, there's a corner case with closures that produces incorrect behavior. To fix it, we'll introduce a new semantic analysis pass that walks the syntax tree before execution, resolving variable references to their declarations.
The Problem with Closures
Our current implementation has a subtle bug related to how closures capture variables. Consider this pathological (but valid) program:
var a = "global";
{
fun showA() {
print a;
}
showA();
var a = "block";
showA();
}
What should this print? According to lexical scoping rules, a
in showA()
should always refer to the same variable throughout the program's execution. Since the function is declared before the local a
exists, it should reference the global a
and print "global"
twice.
But if we run this with our current interpreter, we get:
global
block
The same print a
statement mysteriously prints different values! This violates the fundamental principle of static scoping: a variable expression should always resolve to the same declaration.
Why This Happens
The bug arises from how we look up variables. When we evaluate a variable expression, we walk the current environment chain at runtime. Here's what happens:
When we first call
showA()
, we create a function body environment that chains back to the block scope, which chains to global scope. The lookup findsa
in the global scope.After declaring the local
a
, the block environment now contains a binding fora
.When we call
showA()
again, we create another function body environment chaining to the same block scope. This time, the lookup findsa
in the block scope first.
The problem is that variable resolution is dynamic when it should be static. The meaning of a
changes based on when the function executes, not where it was defined.
Static Scope and Semantic Analysis
Lox uses lexical scoping (also called static scoping), meaning we can determine which declaration a variable refers to just by reading the source code. The scope rules are:
A variable usage refers to the preceding declaration with the same name in the innermost scope that encloses the expression where the variable is used.
Key aspects of this rule:
- "Preceding" means appearing before in the program text
- "Innermost" handles shadowing - the closest enclosing scope wins
- "Encloses the expression" is determined by the structure of the code, not runtime execution order
To implement this correctly, we need to resolve variables before running the program. This is a form of semantic analysis - analyzing the meaning of the program beyond just syntax.
The Solution: A Resolver Pass
We'll add a new pass that runs between parsing and interpretation:
- Parse: Build the syntax tree (existing)
- Resolve: Walk the tree, computing which declaration each variable refers to (new)
- Interpret: Execute the program using the resolution information (updated)
The resolver will compute, for each variable expression, how many scopes away its declaration is. We'll store this information and use it during interpretation to jump directly to the right environment.
The Resolver Pass
The resolver is implemented in app/resolver.py
as a visitor that walks the syntax tree. Unlike the interpreter which executes code, the resolver only analyzes it.
The Resolver Class
class Resolver(ExprVisitor, StmtVisitor):
def __init__(self, interpreter):
self.interpreter = interpreter
self.scopes = deque()
self.current_function = FunctionType.NONE
self.had_error = False
The resolver maintains:
interpreter
: Reference to the interpreter where we'll store resolution resultsscopes
: Stack of scopes, each scope is a dictionary mapping variable names to boolean valuescurrent_function
: Tracks whether we're inside a function (for error checking)had_error
: Flag to track resolution errors
The boolean values in each scope dictionary indicate whether a variable's initializer has been resolved:
False
: Declared but not yet defined (initializer hasn't been resolved)True
: Fully defined (initializer has been resolved)
This distinction helps us catch errors like:
var a = a; // Error: Can't read variable in its own initializer
Scope Management
The resolver tracks scopes using a stack:
def begin_scope(self):
self.scopes.append({})
def end_scope(self):
self.scopes.pop()
As we traverse the tree, we push a new scope when entering a block and pop it when leaving. This mirrors the runtime environment chain but exists only during analysis.
Resolving Blocks
Block statements create new scopes:
def visit_block_stmt(self, stmt: BlockStmt):
self.begin_scope()
self.resolve(stmt.statements)
self.end_scope()
return None
The resolve()
method is a polymorphic helper that can handle statements, expressions, or lists:
def resolve(self, stmt: list[Stmt] | Stmt | Expr):
if isinstance(stmt, list):
for st in stmt:
self.resolve(st)
elif isinstance(stmt, (Stmt, Expr)):
stmt.accept(self)
Implementing Variable Resolution
Variable resolution happens in two phases: declaration and resolution.
Variable Declarations
When we encounter a variable declaration, we add it to the current scope:
def visit_var_stmt(self, stmt):
self.declare(stmt.name)
if (stmt.initializer):
self.resolve(stmt.initializer)
self.define(stmt.name)
return None
The process is split into two steps:
def declare(self, token):
if (len(self.scopes) == 0):
return
if token.lexeme in self.scopes[-1]:
ResolveError.error(self, token,
"Already a variable with this name in this scope.")
self.scopes[-1][token.lexeme] = False
def define(self, token):
if (len(self.scopes) == 0):
return
self.scopes[-1][token.lexeme] = True
The two-step process allows us to:
- Declare: Mark the variable name as "declared but not ready" (
False
) - Resolve the initializer (if present)
- Define: Mark the variable as "ready to use" (
True
)
This prevents code like var a = a;
from referring to itself.
Variable References
When we encounter a variable expression, we resolve it:
def visit_var(self, expr):
if (len(self.scopes) != 0) and self.scopes[-1].get(expr.name.lexeme) is False:
ResolveError.error(self, expr.name,
"Can't read local variable in its own initializer.")
self.resolve_local(expr, expr.name)
return None
First, we check if we're trying to read a variable in its own initializer. Then we resolve it:
def resolve_local(self, expr, token):
for i in range(len(self.scopes)-1, -1, -1):
if token.lexeme in self.scopes[i]:
self.interpreter.resolve(expr, len(self.scopes) - 1 - i)
return
The resolve_local()
method walks the scope stack from innermost to outermost. When it finds the variable, it tells the interpreter how many scopes away it is. The distance is len(self.scopes) - 1 - i
:
- Current scope: distance 0
- Parent scope: distance 1
- Grandparent scope: distance 2
- And so on...
If we don't find the variable in any local scope, we assume it's global (distance is not recorded).
Assignment Resolution
Assignments need resolution too:
def visit_assign(self, expr: Assign):
self.resolve(expr.value)
self.resolve_local(expr, expr.name)
return None
We resolve the right-hand side expression first, then resolve where the variable is declared.
Function Resolution
Functions need special handling because they introduce both a new scope and potentially create closures:
def visit_function_stmt(self, stmt: FunctionStmt):
self.declare(stmt.name)
self.define(stmt.name)
self.resolve_function(stmt, FunctionType.FUNCTION)
return None
def resolve_function(self, fn: FunctionStmt, type: FunctionType):
enclosing_function = self.current_function
self.current_function = type
self.begin_scope()
for param in fn.params:
self.declare(param)
self.define(param)
self.resolve(fn.body)
self.end_scope()
self.current_function = enclosing_function
Key points:
- We declare and define the function name before resolving its body. This allows recursive functions to reference themselves.
- We create a new scope for the function body
- We declare and define each parameter in that scope
- We resolve the function body
- We restore the previous function context (for handling nested functions)
Other Statements and Expressions
The resolver visits all statement and expression types, resolving any nested constructs:
def visit_if_stmt(self, stmt: IfStmt):
self.resolve(stmt.condition)
self.resolve(stmt.then_branch)
if stmt.else_branch:
self.resolve(stmt.else_branch)
def visit_while_stmt(self, stmt):
self.resolve(stmt.condition)
self.resolve(stmt.body)
def visit_binary(self, expr):
self.resolve(expr.left)
self.resolve(expr.right)
# ... and so on for all node types
For expressions like literals that don't contain variables, we do nothing:
def visit_literal(self, expr):
return None
Updating the Interpreter
Now we need to update the interpreter to use the resolution information.
Storing Resolution Results
In app/interpreter.py
, we add a dictionary to store the resolution data:
class Interpreter(ExprVisitor, StmtVisitor):
def __init__(self):
self.had_runtime_error = False
self.globals = Environment()
self.environment = self.globals
self.globals.define("clock", Clock())
self.locals = {} # New: Maps expressions to scope distances
The locals
dictionary maps expression nodes to integers representing how many scopes away their declarations are.
We add a method for the resolver to call:
def resolve(self, expr: Expr, depth: int):
self.locals[expr] = depth
Looking Up Variables
We update variable lookup to use the resolution information:
def visit_var(self, expr: Variable):
return self.look_up_variable(expr.name, expr)
def look_up_variable(self, token, expr):
if expr not in self.locals:
return self.globals.get(token)
else:
distance = self.locals[expr]
return self.environment.get_at(distance, token.lexeme)
If we have resolution information, we use it. Otherwise, we treat it as a global variable (backward compatibility for expressions not passed through the resolver).
Environment Distance Access
We add new methods to app/environment.py
for accessing variables at a specific distance:
def get_at(self, distance: int, name: str):
return self.ancestor(distance).values[name]
def assign_at(self, distance: int, name: Token, value):
self.ancestor(distance).values[name.lexeme] = value
def ancestor(self, distance: int):
env = self
for i in range(0, distance):
env = env.enclosing
return env
The ancestor()
method walks up the environment chain a specific number of times. Then we access the variable directly in that environment's dictionary - no searching needed!
Updating Assignment
We update assignment to use resolved distances:
def visit_assign(self, expr: Assign):
value = self.evaluate(expr.value)
if expr in self.locals:
distance = self.locals[expr]
self.environment.assign_at(distance, expr.name, value)
else:
self.globals.assign(expr.name, value)
return value
Running the Resolver
In app/main.py
, we insert the resolver between parsing and interpretation:
case Command.run:
parser = Parser(scanner.tokens)
try:
statements = parser.parse()
if statements and not parser.had_error:
interpreter = Interpreter()
resolver = Resolver(interpreter)
resolver.resolve(statements)
if resolver.had_error:
exit(65)
value = interpreter.interpret(statements)
if not interpreter.had_runtime_error:
exit(0)
We run the resolver after parsing succeeds but before interpretation. If resolution fails, we exit without running the program.
Resolution Errors
The resolver can catch certain errors statically, before the program runs. This is a significant advantage - we can give users better error messages earlier.
Duplicate Variable Declaration
We can detect when a variable is declared twice in the same scope:
fun bad() {
var a = "first";
var a = "second"; // Error!
}
This is caught in the declare()
method:
def declare(self, token):
if (len(self.scopes) == 0):
return
if token.lexeme in self.scopes[-1]:
ResolveError.error(self, token,
"Already a variable with this name in this scope.")
self.scopes[-1][token.lexeme] = False
Note: We allow duplicate declarations in the global scope (when len(self.scopes) == 0
) for REPL convenience.
Self-Reference in Initializer
We prevent variables from referencing themselves in their own initializer:
var a = a; // Error: Can't read local variable in its own initializer
This is caught in visit_var()
:
def visit_var(self, expr):
if (len(self.scopes) != 0) and self.scopes[-1].get(expr.name.lexeme) is False:
ResolveError.error(self, expr.name,
"Can't read local variable in its own initializer.")
self.resolve_local(expr, expr.name)
return None
Top-Level Return
We can detect return statements outside of functions:
return "at top level"; // Error: Can't return from top-level code
We track the current function context using an enum and we will later add other function types such as class methods here:
FunctionType = Enum('FunctionType', ['NONE', 'FUNCTION'])
And check it when visiting return statements:
def visit_return_stmt(self, stmt):
if self.current_function == FunctionType.NONE:
ResolveError.error(self, stmt.keyword,
"Can't return from top-level code.")
if stmt.value:
self.resolve(stmt.value)
Error Reporting
Resolution errors are reported as compile-time errors using the existing error infrastructure:
class ResolveError(Exception):
def __init__(self, msg):
super().__init__(msg)
@staticmethod
def error(resolver, token: Token, msg: str):
resolver.had_error = True
compile_error(token, msg)
return ResolveError(msg)
Conclusion
With the resolver in place, our interpreter now correctly implements lexical scoping even in tricky edge cases. The key insights are:
- Variable resolution is separate from execution.
- Precomputing scope distance is key.
- Static analysis catches bugs early.
The resolver demonstrates an important principle in language implementation: separating concerns. Instead of mixing variable resolution with execution, we handle them in separate passes. This makes the code cleaner and enables optimizations.
Our Lox interpreter now has three passes: scanning, parsing, and resolution before interpretation. The bug we fixed might seem obscure, but it represents a category of subtle semantic issues that can appear in language implementations. By adding static analysis, we've made our interpreter more correct and more robust. The resolver can also serve as a foundation for future optimizations and error checking.