Writing my own interpreter for Lox, Part 4 - Statements
This is the next part in writing my own interpreter for Lox in Python. Previously, we have implemented an interpreter for expressions while also providing some basic runtime error handling. In this post, we are going to work with statements which makes the programming language way more useful than it was with just expressions. In the latter case, it's just a calculator.
Now we're implementing the Statements and State chapter from Crafting Interpreters. This chapter introduces fundamental programming language features: statements, variables, assignment, and lexical scoping.
Updated Grammar
The grammar now includes statements and variable declarations. Here's the complete grammar we're working with:
program → declaration* EOF ;
declaration → varDecl
| statement ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
statement → exprStmt
| printStmt
| block ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;
expression → assignment ;
assignment → IDENTIFIER "=" assignment
| equality ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → "true" | "false" | "nil"
| NUMBER | STRING | IDENTIFIER
| "(" expression ")" ;
The key changes from our expression-only grammar are:
- Program structure: A program is now a series of one or more declaration rules
- Statements: Added expression statements, print statements, and block statements
- Variable declarations:
var
keyword with optional initialization - Assignment: Added as the lowest precedence expression
- Identifiers: Variables can now be referenced by name
Print and Expression Statements
Statement Syntax Trees
Just as we created expression syntax trees using the Visitor pattern, we need statement syntax trees. I created app/stmt.py
with the statement visitor interface and concrete statement classes:
class StmtVisitor(ABC):
def visit_expression_stmt(stmt):
raise NotImplementedError()
def visit_print_stmt(stmt):
raise NotImplementedError()
def visit_var_stmt(stmt):
raise NotImplementedError()
def visit_block_stmt(stmt):
raise NotImplementedError()
class Stmt(ABC):
def accept(self, visitor: StmtVisitor):
raise NotImplementedError()
class ExprStmt(Stmt):
def __init__(self, expression: Expr):
self.expression = expression
def accept(self, visitor: StmtVisitor):
return visitor.visit_expression_stmt(self)
class PrintStmt(Stmt):
def __init__(self, expression: Expr):
self.expression = expression
def accept(self, visitor: StmtVisitor):
return visitor.visit_print_stmt(self)
class VarStmt(Stmt):
def __init__(self, name: Token, initializer: Expr):
self.name = name
self.initializer = initializer
def accept(self, visitor: StmtVisitor):
return visitor.visit_var_stmt(self)
class BlockStmt(Stmt):
def __init__(self, statements: list[Stmt]):
self.statements = statements
def accept(self, visitor: StmtVisitor):
return visitor.visit_block_stmt(self)
The concrete statement types are:
ExprStmt
: Wraps an expression to execute it for side effectsPrintStmt
: Evaluates and prints an expressionVarStmt
: Declares a new variable with optional initializationBlockStmt
: Groups multiple statements with their own scope
Parsing and Executing Statements
I updated the Parser
class in app/parser.py
to handle statements. The entry point is now parse()
which returns a list of statements rather than a single expression:
def parse(self):
statements = []
while (not self.is_at_end()):
statements.append(self.declaration())
return statements
def declaration(self):
if (self.match(TokenType.VAR)):
return self.variable_declaration_statement()
return self.statement()
def statement(self):
if (self.match(TokenType.PRINT)):
return self.print_statement()
if (self.match(TokenType.LEFT_BRACE)):
return BlockStmt(self.block_statement())
return self.expression_statement()
def print_statement(self):
expr = self.expression()
self.consume(TokenType.SEMICOLON, "Expect ';' after value.")
return PrintStmt(expr)
def expression_statement(self):
expr = self.expression()
self.consume(TokenType.SEMICOLON, "Expect ';' after expression.")
return ExprStmt(expr)
def block_statement(self):
statements = []
while not self.check(TokenType.RIGHT_BRACE) and not self.is_at_end():
statements.append(self.declaration())
self.consume(TokenType.RIGHT_BRACE, "Expect '}' after block.")
return statements
The grammar now starts with program → declaration* EOF
where declarations can be variable declarations or statements.
The Interpreter
class now implements both ExprVisitor
and StmtVisitor
. I added methods to execute statements:
def interpret(self, statements: list[Stmt]):
try:
for stmt in statements:
self.execute(stmt)
except LoxRuntimeError as err:
self.had_runtime_error = True
runtime_error(err)
def visit_expression_stmt(self, stmt: ExprStmt):
self.evaluate(stmt.expression)
return None
def visit_print_stmt(self, stmt: PrintStmt):
value = self.evaluate(stmt.expression)
print(get_str_for_lox_value(value))
return None
def execute(self, stmt: Stmt):
stmt.accept(self)
Global Variables
Variable declarations follow the syntax var name = initializer;
where the initializer is optional. If no initializer is provided, the variable defaults to nil
.
In the parser, variable declarations are handled by:
def variable_declaration_statement(self):
var_name = self.consume(TokenType.IDENTIFIER, "Expect variable name.")
initializer = None
if self.match(TokenType.EQUAL):
initializer = self.expression()
self.consume(TokenType.SEMICOLON, "Expect ';' after variable declaration.")
return VarStmt(var_name, initializer)
For variable access, I added a new expression type Variable
to app/expr.py
that simply holds a variable name token.
Environments
Variables need to be stored somewhere. I created app/environment.py
with an Environment
class that manages variable bindings:
class Environment:
def __init__(self, enclosing=None):
self.values = {}
self.enclosing = enclosing
def define(self, name: str, value):
self.values[name] = value
def get(self, name: Token):
if name.lexeme in self.values:
return self.values[name.lexeme]
if self.enclosing:
return self.enclosing.get(name)
raise LoxRuntimeError(name, f"Undefined variable '{name.lexeme}'.")
def assign(self, name: Token, value):
if name.lexeme in self.values: # current scope
self.values[name.lexeme] = value
elif self.enclosing: # enclosing scope
self.enclosing.assign(name, value)
else: # nowhere found
raise LoxRuntimeError(name, f"Undefined variable '{name.lexeme}'.")
The environment uses a Python dictionary to store variable mappings. The enclosing
field supports nested scopes by chaining environments together. The assign
method traverses the environment chain to find an existing variable, ensuring that assignment doesn't accidentally create new variables.
In the interpreter, variables are handled with:
def visit_var_stmt(self, stmt: VarStmt):
value = None
if (stmt.initializer):
value = self.evaluate(stmt.initializer)
self.environment.define(stmt.name.lexeme, value)
return None
def visit_var(self, expr: Variable):
return self.environment.get(expr.name)
Assignment
Assignment is implemented as an expression that returns the assigned value. I added an Assign
expression class and updated the expression grammar to handle assignment at low precedence:
def assignment(self):
expr = self.equality()
if self.match(TokenType.EQUAL):
equals = self.previous()
rvalue = self.assignment()
if isinstance(expr, Variable):
name = expr.name
return Assign(name, rvalue)
ParseError.error(equals, "Invalid assignment target.")
return expr
Assignment is right-associative, so a = b = c
parses as a = (b = c)
. The interpreter executes assignment by:
def visit_assign(self, expr: Assign):
value = self.evaluate(expr.value)
self.environment.assign(expr.name, value)
return value
The Environment.assign()
method ensures the variable exists before allowing assignment, preventing accidental variable creation.
Scope
Block statements create new lexical scopes. When entering a block, we create a new environment that encloses the current one:
def visit_block_stmt(self, stmt: BlockStmt):
self.execute_block(stmt.statements, Environment(self.environment))
def execute_block(self, stmts: list[Stmt], current_env: Environment):
enclosing_env = self.environment
try:
self.environment = current_env
for stmt in stmts:
self.execute(stmt)
finally:
self.environment = enclosing_env
The execute_block
method temporarily switches the interpreter's environment to the new block scope, executes all statements within that scope, and then restores the previous environment using a finally
block. This ensures proper cleanup even if an exception occurs.
This implementation supports variable shadowing. Consider this example:
var a = "global";
{
var a = "local";
print a; // prints "local"
}
print a; // prints "global"
The inner a
shadows the outer one but doesn't affect it. When the block ends, the inner environment is discarded and the outer a
remains unchanged.
Here's a more complex example showing nested scopes:
var a = "global a";
var b = "global b";
var c = "global c";
{
var a = "outer a";
var b = "outer b";
{
var a = "inner a";
print a; // inner a
print b; // outer b
print c; // global c
}
print a; // outer a
print b; // outer b
print c; // global c
}
print a; // global a
print b; // global b
print c; // global c
I also refactored app/main.py
and addded a new run
command that parses and executes a complete Lox program:
case Command.run:
if scanner.had_error:
exit(65)
parser = Parser(scanner.tokens)
try:
statements = parser.parse()
if statements:
interpreter = Interpreter()
interpreter.interpret(statements)
if not interpreter.had_runtime_error:
exit(0)
else:
exit(70)
except Exception:
exit(65)
With these changes, our Lox interpreter now supports:
- Variable declarations and initialization
- Variable access and assignment
- Print statements for output
- Expression statements for side effects
- Block statements with lexical scoping
- Proper error handling for undefined variables
The language has evolved from a simple expression evaluator to a more complete programming language with certain statements and state management.