Sai Sasank Y

Writing my own interpreter for Lox, Part 4 - Statements

This is the next part in writing my own interpreter for Lox in Python. Previously, we have implemented an interpreter for expressions while also providing some basic runtime error handling. In this post, we are going to work with statements which makes the programming language way more useful than it was with just expressions. In the latter case, it's just a calculator.

Now we're implementing the Statements and State chapter from Crafting Interpreters. This chapter introduces fundamental programming language features: statements, variables, assignment, and lexical scoping.

Updated Grammar

The grammar now includes statements and variable declarations. Here's the complete grammar we're working with:

program        → declaration* EOF ;

declaration    → varDecl
               | statement ;

varDecl        → "var" IDENTIFIER ( "=" expression )? ";" ;

statement      → exprStmt
               | printStmt
               | block ;

exprStmt       → expression ";" ;
printStmt      → "print" expression ";" ;
block          → "{" declaration* "}" ;

expression     → assignment ;
assignment     → IDENTIFIER "=" assignment
               | equality ;

equality       → comparison ( ( "!=" | "==" ) comparison )* ;
comparison     → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term           → factor ( ( "-" | "+" ) factor )* ;
factor         → unary ( ( "/" | "*" ) unary )* ;
unary          → ( "!" | "-" ) unary | primary ;
primary        → "true" | "false" | "nil"
               | NUMBER | STRING | IDENTIFIER
               | "(" expression ")" ;

The key changes from our expression-only grammar are:

Statement Syntax Trees

Just as we created expression syntax trees using the Visitor pattern, we need statement syntax trees. I created app/stmt.py with the statement visitor interface and concrete statement classes:

class StmtVisitor(ABC):
    def visit_expression_stmt(stmt):
        raise NotImplementedError()
    
    def visit_print_stmt(stmt):
        raise NotImplementedError()
    
    def visit_var_stmt(stmt):
        raise NotImplementedError()
    
    def visit_block_stmt(stmt):
        raise NotImplementedError()

class Stmt(ABC):
    def accept(self, visitor: StmtVisitor):
        raise NotImplementedError()

class ExprStmt(Stmt):
    def __init__(self, expression: Expr):
        self.expression = expression
    
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_expression_stmt(self)
    
class PrintStmt(Stmt):
    def __init__(self, expression: Expr):
        self.expression = expression
    
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_print_stmt(self)

class VarStmt(Stmt):
    def __init__(self, name: Token, initializer: Expr):
        self.name = name
        self.initializer = initializer
        
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_var_stmt(self)

class BlockStmt(Stmt):
    def __init__(self, statements: list[Stmt]):
        self.statements = statements
    
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_block_stmt(self)

The concrete statement types are:

Parsing and Executing Statements

I updated the Parser class in app/parser.py to handle statements. The entry point is now parse() which returns a list of statements rather than a single expression:

def parse(self):
    statements = []
    while (not self.is_at_end()):
        statements.append(self.declaration())
    return statements

def declaration(self):
    if (self.match(TokenType.VAR)):
        return self.variable_declaration_statement()
    return self.statement()

def statement(self):
    if (self.match(TokenType.PRINT)):
        return self.print_statement()
    if (self.match(TokenType.LEFT_BRACE)):
        return BlockStmt(self.block_statement())
    return self.expression_statement()

def print_statement(self):
    expr = self.expression()
    self.consume(TokenType.SEMICOLON, "Expect ';' after value.")
    return PrintStmt(expr)

def expression_statement(self):
    expr = self.expression()
    self.consume(TokenType.SEMICOLON, "Expect ';' after expression.")
    return ExprStmt(expr)

def block_statement(self):
    statements = []
    while not self.check(TokenType.RIGHT_BRACE) and not self.is_at_end():
        statements.append(self.declaration())
    self.consume(TokenType.RIGHT_BRACE, "Expect '}' after block.")
    return statements

The grammar now starts with program → declaration* EOF where declarations can be variable declarations or statements.

The Interpreter class now implements both ExprVisitor and StmtVisitor. I added methods to execute statements:

def interpret(self, statements: list[Stmt]):
    try:
        for stmt in statements:
            self.execute(stmt)
    except LoxRuntimeError as err:
        self.had_runtime_error = True
        runtime_error(err)

def visit_expression_stmt(self, stmt: ExprStmt):
    self.evaluate(stmt.expression)
    return None

def visit_print_stmt(self, stmt: PrintStmt):
    value = self.evaluate(stmt.expression)
    print(get_str_for_lox_value(value))
    return None

def execute(self, stmt: Stmt):
    stmt.accept(self)

Global Variables

Variable declarations follow the syntax var name = initializer; where the initializer is optional. If no initializer is provided, the variable defaults to nil.

In the parser, variable declarations are handled by:

def variable_declaration_statement(self):
    var_name = self.consume(TokenType.IDENTIFIER, "Expect variable name.")
    initializer = None
    if self.match(TokenType.EQUAL):
        initializer = self.expression()
    
    self.consume(TokenType.SEMICOLON, "Expect ';' after variable declaration.")
    return VarStmt(var_name, initializer)

For variable access, I added a new expression type Variable to app/expr.py that simply holds a variable name token.

Environments

Variables need to be stored somewhere. I created app/environment.py with an Environment class that manages variable bindings:

class Environment:
    def __init__(self, enclosing=None):
        self.values = {}
        self.enclosing = enclosing
        
    def define(self, name: str, value):
        self.values[name] = value
    
    def get(self, name: Token):
        if name.lexeme in self.values:
            return self.values[name.lexeme]
        if self.enclosing:
            return self.enclosing.get(name)
        raise LoxRuntimeError(name, f"Undefined variable '{name.lexeme}'.")
    
    def assign(self, name: Token, value):
        if name.lexeme in self.values:  # current scope
            self.values[name.lexeme] = value
        elif self.enclosing:  # enclosing scope
            self.enclosing.assign(name, value)
        else:  # nowhere found
            raise LoxRuntimeError(name, f"Undefined variable '{name.lexeme}'.")

The environment uses a Python dictionary to store variable mappings. The enclosing field supports nested scopes by chaining environments together. The assign method traverses the environment chain to find an existing variable, ensuring that assignment doesn't accidentally create new variables.

In the interpreter, variables are handled with:

def visit_var_stmt(self, stmt: VarStmt):
    value = None
    if (stmt.initializer):
        value = self.evaluate(stmt.initializer)
    self.environment.define(stmt.name.lexeme, value)
    return None

def visit_var(self, expr: Variable):
    return self.environment.get(expr.name)

Assignment

Assignment is implemented as an expression that returns the assigned value. I added an Assign expression class and updated the expression grammar to handle assignment at low precedence:

def assignment(self):
    expr = self.equality()
    
    if self.match(TokenType.EQUAL):
        equals = self.previous()
        rvalue = self.assignment()
        
        if isinstance(expr, Variable):
            name = expr.name
            return Assign(name, rvalue)
        ParseError.error(equals, "Invalid assignment target.")
    
    return expr

Assignment is right-associative, so a = b = c parses as a = (b = c). The interpreter executes assignment by:

def visit_assign(self, expr: Assign):
    value = self.evaluate(expr.value)
    self.environment.assign(expr.name, value)
    return value

The Environment.assign() method ensures the variable exists before allowing assignment, preventing accidental variable creation.

Scope

Block statements create new lexical scopes. When entering a block, we create a new environment that encloses the current one:

def visit_block_stmt(self, stmt: BlockStmt):
    self.execute_block(stmt.statements, Environment(self.environment))
    
def execute_block(self, stmts: list[Stmt], current_env: Environment):
    enclosing_env = self.environment
    try:
        self.environment = current_env
        for stmt in stmts:
            self.execute(stmt)
    finally:
        self.environment = enclosing_env

The execute_block method temporarily switches the interpreter's environment to the new block scope, executes all statements within that scope, and then restores the previous environment using a finally block. This ensures proper cleanup even if an exception occurs.

This implementation supports variable shadowing. Consider this example:

var a = "global";
{
    var a = "local";
    print a; // prints "local"
}
print a; // prints "global"

The inner a shadows the outer one but doesn't affect it. When the block ends, the inner environment is discarded and the outer a remains unchanged.

Here's a more complex example showing nested scopes:

var a = "global a";
var b = "global b";
var c = "global c";
{
    var a = "outer a";
    var b = "outer b";
    {
        var a = "inner a";
        print a;  // inner a
        print b;  // outer b
        print c;  // global c
    }
    print a;  // outer a
    print b;  // outer b
    print c;  // global c
}
print a;  // global a
print b;  // global b
print c;  // global c

I also refactored app/main.py and addded a new run command that parses and executes a complete Lox program:

case Command.run:
    if scanner.had_error:
        exit(65)
    parser = Parser(scanner.tokens)
    try:
        statements = parser.parse()
        if statements:
            interpreter = Interpreter()
            interpreter.interpret(statements)
            if not interpreter.had_runtime_error:
                exit(0)
            else:
                exit(70)
    except Exception:
        exit(65)

With these changes, our Lox interpreter now supports:

The language has evolved from a simple expression evaluator to a more complete programming language with certain statements and state management.

#compilers #lox-interpreter #programming-languages #software-engineering