Sai Sasank Y

Writing my own interpreter for Lox, Part 5 - Control Flow

This is the next part in writing my own interpreter for Lox in Python. Previously, we implemented statements and state management, transforming our interpreter from a calculator into a language with variables, assignments, and lexical scoping. In this post, we add control flow - the feature that makes our language Turing complete. Here is the corresponding chapter from the book, Crafting Interpreters.

Control flow comes in two main flavors:

With the addition of these features, our Lox interpreter can now express any computation that a Turing machine can perform, making it theoretically as powerful as any programming language.

Updated Grammar

The grammar now includes conditional and looping statements. Here's the complete grammar we're working with:

program        → declaration* EOF ;

declaration    → varDecl
               | statement ;

varDecl        → "var" IDENTIFIER ( "=" expression )? ";" ;

statement      → exprStmt
               | ifStmt
               | printStmt
               | whileStmt
               | forStmt
               | block ;

ifStmt         → "if" "(" expression ")" statement
               ( "else" statement )? ;
whileStmt      → "while" "(" expression ")" statement ;
forStmt        → "for" "(" ( varDecl | exprStmt | ";" )
               expression? ";"
               expression? ")" statement ;

exprStmt       → expression ";" ;
printStmt      → "print" expression ";" ;
block          → "{" declaration* "}" ;

expression     → assignment ;
assignment     → IDENTIFIER "=" assignment
               | logic_or ;

logic_or       → logic_and ( "or" logic_and )* ;
logic_and      → equality ( "and" equality )* ;

equality       → comparison ( ( "!=" | "==" ) comparison )* ;
comparison     → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term           → factor ( ( "-" | "+" ) factor )* ;
factor         → unary ( ( "/" | "*" ) unary )* ;
unary          → ( "!" | "-" ) unary | primary ;
primary        → "true" | "false" | "nil"
               | NUMBER | STRING | IDENTIFIER
               | "(" expression ")" ;

The key additions from our previous grammar are:

Conditional Execution: If Statements

If statements allow our programs to make decisions. The syntax follows the familiar pattern from C-family languages:

if (condition) {
    // executed if condition is truthy
} else {
    // executed if condition is falsy
}

Statement Syntax Trees

I added a new IfStmt class to app/stmt.py:

class IfStmt(Stmt):
    def __init__(self, condition: Expr, then_branch: Stmt, else_branch: Stmt):
        self.condition = condition
        self.then_branch = then_branch
        self.else_branch = else_branch
    
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_if_stmt(self)

The if statement holds three components:

Parsing If Statements

In app/parser.py, I added parsing logic for if statements:

def statement(self):
    if (self.match(TokenType.IF)):
        return self.if_statement()
    # ... other statement types
    
def if_statement(self):
    self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'if'.")
    condition = self.expression()
    self.consume(TokenType.RIGHT_PAREN, "Expect ')' after if condition.")
    
    then_branch = self.statement()
    else_branch = None
    
    if (self.match(TokenType.ELSE)):
        else_branch = self.statement()
    
    return IfStmt(condition, then_branch, else_branch)

The parser enforces the C-style syntax with parentheses around the condition, even though they're somewhat redundant. This follows the original Lox design from Crafting Interpreters.

Executing If Statements

In app/interpreter.py, if statement execution is straightforward:

def visit_if_stmt(self, stmt: IfStmt):
    is_cond_true = self.is_truthy(self.evaluate(stmt.condition))
    if (is_cond_true):
        self.execute(stmt.then_branch)
    elif stmt.else_branch is not None:
        self.execute(stmt.else_branch)
    return None

The interpreter evaluates the condition expression and uses Lox's truthiness rules:

Logical Operators

Logical operators and and or provide a way to combine boolean expressions. Unlike arithmetic operators, they implement short-circuit evaluation - they don't always evaluate both operands.

Short-Circuit Behavior

This behavior is crucial for both performance and correctness. Consider:

var x = nil;
if (x != nil and x.length > 0) {
    // This won't crash even though x is nil
}

Logical Expression Syntax Trees

I added a Logical expression class to app/expr.py:

class Logical(Expr):
    def __init__(self, left: Expr, operator: Token, right: Expr):
        self.left = left
        self.operator = operator
        self.right = right
    
    def accept(self, visitor: ExprVisitor):
        return visitor.visit_logical(self)

Parsing Logical Operators

Logical operators have their own precedence level in the expression hierarchy. In app/parser.py:

def assignment(self):
    expr = self.or_()
    # ... assignment logic

def or_(self):
    expr = self.and_()
    while (self.match(TokenType.OR)):
        op = self.previous()
        right = self.and_()
        expr = Logical(expr, op, right)
    return expr
    
def and_(self):
    expr = self.equality()
    while (self.match(TokenType.AND)):
        op = self.previous()
        right = self.equality()
        expr = Logical(expr, op, right)
    return expr

The precedence hierarchy places or at the lowest precedence (just above assignment) and and at higher precedence, so a or b and c parses as a or (b and c).

Executing Logical Operators

The interpreter implements short-circuit evaluation:

def visit_logical(self, expr: Logical):
    left = self.evaluate(expr.left)
    
    # short-circuit
    if (expr.operator.token_type == TokenType.OR):
        if (self.is_truthy(left)):
            return left
    else:
        if (not self.is_truthy(left)):
            return left
    return self.evaluate(expr.right)

Note that logical operators return the actual values, not just true or false. This allows for patterns like:

var name = user_input or "default";

While Loops

While loops provide the fundamental looping construct. They repeatedly execute a statement as long as a condition remains truthy.

While Statement Syntax Tree

I added a WhileStmt class to app/stmt.py:

class WhileStmt(Stmt):
    def __init__(self, condition: Expr, body: Stmt):
        self.condition = condition
        self.body = body
    
    def accept(self, visitor: StmtVisitor):
        return visitor.visit_while_stmt(self)

Parsing While Loops

The parser handles while loops similarly to if statements:

def statement(self):
    if (self.match(TokenType.WHILE)):
        return self.while_statement()
    # ... other statement types

def while_statement(self):
    self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'while'.")
    condition = self.expression()
    self.consume(TokenType.RIGHT_PAREN, "Expect ')' after condition.")
    body = self.statement()
    return WhileStmt(condition, body)

Executing While Loops

The interpreter evaluates the condition before each iteration:

def visit_while_stmt(self, stmt: WhileStmt):
    while (self.is_truthy(self.evaluate(stmt.condition))):
        self.execute(stmt.body)
    return None

This implementation means:

For Loops with Desugaring

For loops provide a convenient syntax for common looping patterns. Rather than implementing them as a new primitive, I used desugaring - transforming for loops into equivalent while loops during parsing.

C-Style For Loop Syntax

Lox supports C-style for loops with three clauses:

for (initializer; condition; increment) {
    // body
}

Each clause is optional:

Parsing and Desugaring

The for_statement() method in app/parser.py parses all three clauses and then transforms them into a while loop:

def for_statement(self):
    self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'for'.")
    
    # initializer
    if self.match(TokenType.SEMICOLON):
        initializer = None
    elif (self.match(TokenType.VAR)):
        initializer = self.variable_declaration_statement()
    else:
        initializer = self.expression_statement()
    
    # condition
    condition = None
    if (not self.check(TokenType.SEMICOLON)):
        condition = self.expression()
    self.consume(TokenType.SEMICOLON, "Expect ';' after loop condition.")
    
    # increment
    increment = None
    if (not self.check(TokenType.RIGHT_PAREN)):
        increment = self.expression()
    self.consume(TokenType.RIGHT_PAREN, "Expect ')' after for clauses.")
    
    # loop body
    body = self.statement()
    
    # Desugar for loop into while loop and statements
    if (increment is not None):
        body = BlockStmt([body, ExprStmt(increment)])
    
    if (condition is None):
        condition = Literal(True)
    
    body = WhileStmt(condition, body)
    
    if initializer is not None:
        body = BlockStmt([initializer, body])
    
    return body

Desugaring Process

The desugaring happens in reverse order:

  1. Handle increment: Wrap the body and increment in a block so the increment runs after each iteration
  2. Handle condition: If no condition is provided, default to true (infinite loop)
  3. Create while loop: Use the condition and the (possibly modified) body
  4. Handle initializer: If present, wrap the initializer and while loop in a block

For example, this for loop:

for (var i = 0; i < 10; i = i + 1) {
    print i;
}

Gets desugared to:

{
    var i = 0;
    while (i < 10) {
        {
            print i;
            i = i + 1;
        }
    }
}

This approach means the interpreter doesn't need any special for loop handling - it already knows how to execute while loops and block statements.

Error Recovery with Synchronization

It's a fact that users make syntax errors, and a good parser should:

  1. Report the error clearly with location and context
  2. Continue parsing to find additional errors in the same compilation
  3. Avoid cascading errors where one mistake causes a flood of spurious error messages

Notice that points 2 and 3 compete against each other. The book already discussed an approach to error handling called panic mode recovery using a synchronization technique. I have not implemented it earlier but I will now.

The Problem: Cascading Errors

When the parser encounters a syntax error, there are two options: 1) Exit and report the error or 2)Report and continue parsing. We want to report all errors but not spurious errors that would go away if you fix earlier errors.

For example, if you forget a semicolon:

var a = 1  // Missing semicolon
var b = 2;
print a + b;

We should of course have the parser report the missing semicolon error, but also in a reasonable manner, look for other user errors and report them as well so the user can fix them in one-shot.

Synchronization Strategy

So we employ something called synchronization, where once we see an error, we try to restore the state of the parser so it can continue parsing looking for potentially other errors. The key insight is that certain tokens naturally mark the boundaries between statements or declarations and we use them as synchronization points: reliable places to resume parsing.

In app/parser.py, I added a synchronize() method:

def synchronize(self):
    self.advance()
    while not self.is_at_end():
        if (self.previous().token_type == TokenType.SEMICOLON):
            return
        if (
            self.peek().token_type in {
                TokenType.CLASS,
                TokenType.FUN,
                TokenType.VAR,
                TokenType.FOR,
                TokenType.IF,
                TokenType.WHILE,
                TokenType.PRINT,
                TokenType.RETURN
                }
            ):
            return
        self.advance()

This method implements panic mode recovery by:

  1. Advancing past the error: Skip the problematic token
  2. Looking for statement boundaries: Stop at semicolons (end of statements)
  3. Looking for statement beginnings: Stop at keywords that start new statements (var, if, while, etc.)
  4. Discarding tokens: Keep advancing until we find a synchronization point

Integration with Declaration Parsing

The synchronization is integrated into the main parsing loop in the declaration() method:

def declaration(self):
    try:
        if (self.match(TokenType.VAR)):
            return self.variable_declaration_statement()
        return self.statement()
    except ParseError as err:
        self.synchronize()
        return None

When a ParseError is caught:

Error State Tracking

The parser tracks whether any errors occurred during parsing using a had_error flag:

class Parser:
    def __init__(self, tokens: list[Token]):
        self.tokens = tokens
        self.current = 0
        self.had_error = False

This flag is set to True whenever a ParseError is created:

@staticmethod
def error(parser, token: Token, msg: str):
    parser.had_error = True
    parser_error(token, msg)
    return ParseError(msg)

The had_error flag serves several important purposes:

Compilation Pipeline Control: The main interpreter checks this flag to decide whether to proceed with execution:

# In app/main.py
if statements and not parser.had_error:
    interpreter = Interpreter()
    interpreter.interpret(statements)

Exit Code Management: Different exit codes indicate different types of failures:

Error vs. Recovery Distinction: Even though the parser continues after errors (thanks to synchronization), the had_error flag ensures we know the overall parsing wasn't successful. This prevents executing potentially malformed programs while still providing comprehensive error reporting.

Example

Consider this erroneous Lox program:

var a = ;        // Error: missing initializer
var b = 2;       // This should parse correctly
if (b > 0        // Error: missing closing paren
    print b;     // This should parse correctly
var c = 3;       // This should parse correctly

When we try to parse this, we get:

[line 1] Error at ';': Expect expression.
[line 4] Error at 'print': Expect ')' after if condition.

With synchronization:

  1. First error is reported at var a = ;
  2. Parser synchronizes at the semicolon
  3. var b = 2; parses successfully
  4. Second error is reported at missing )
  5. Parser synchronizes at print keyword
  6. Remaining statements parse successfully

Without synchronization, the parser might report cascading errors making it unclear what the real problems are.

Practical Examples

Let's see these control flow features in action. Here's a program that demonstrates if statements, logical operators, and loops:

// Factorial calculator with input validation
var n = 5;

if (n < 0) {
    print "Error: factorial not defined for negative numbers";
} else {
    var factorial = 1;
    var i = 1;
    
    while (i <= n) {
        factorial = factorial * i;
        i = i + 1;
    }
    
    print factorial ;
}

The logical operators shine in conditional expressions:

var user_name = nil;
var display_name = user_name or "Anonymous";
print "Hello, " + display_name;

var age = 25;
if (age >= 18 and age < 65) {
    print "You are in the working age range";
}

Conclusion

With the addition of control flow, our Lox interpreter has reached a significant milestone: Turing completeness. This means our language can theoretically compute anything that any other programming language can compute. Our implementation demonstrates several important concepts including Desugaring and Short-circuit evaluation. The interpreter can now run substantial programs that perform real computation. While it's still missing functions, classes, and other advanced features, it has crossed the threshold from "toy calculator" to "real programming language." On to the next one!

#compilers #lox-interpreter #programming-languages #software-engineering