Writing my own interpreter for Lox, Part 5 - Control Flow
This is the next part in writing my own interpreter for Lox in Python. Previously, we implemented statements and state management, transforming our interpreter from a calculator into a language with variables, assignments, and lexical scoping. In this post, we add control flow - the feature that makes our language Turing complete. Here is the corresponding chapter from the book, Crafting Interpreters.
Control flow comes in two main flavors:
- Conditional control flow (branching) - executing code selectively based on conditions
- Looping control flow - executing code repeatedly until some condition is met
With the addition of these features, our Lox interpreter can now express any computation that a Turing machine can perform, making it theoretically as powerful as any programming language.
Updated Grammar
The grammar now includes conditional and looping statements. Here's the complete grammar we're working with:
program → declaration* EOF ;
declaration → varDecl
| statement ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| forStmt
| block ;
ifStmt → "if" "(" expression ")" statement
( "else" statement )? ;
whileStmt → "while" "(" expression ")" statement ;
forStmt → "for" "(" ( varDecl | exprStmt | ";" )
expression? ";"
expression? ")" statement ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
block → "{" declaration* "}" ;
expression → assignment ;
assignment → IDENTIFIER "=" assignment
| logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → "true" | "false" | "nil"
| NUMBER | STRING | IDENTIFIER
| "(" expression ")" ;
The key additions from our previous grammar are:
- If statements: Conditional execution with optional
else
clause - While loops: Basic looping construct
- For loops: C-style three-clause loops that get desugared to while loops
- Logical operators:
and
andor
with short-circuit evaluation
Conditional Execution: If Statements
If statements allow our programs to make decisions. The syntax follows the familiar pattern from C-family languages:
if (condition) {
// executed if condition is truthy
} else {
// executed if condition is falsy
}
Statement Syntax Trees
I added a new IfStmt
class to app/stmt.py
:
class IfStmt(Stmt):
def __init__(self, condition: Expr, then_branch: Stmt, else_branch: Stmt):
self.condition = condition
self.then_branch = then_branch
self.else_branch = else_branch
def accept(self, visitor: StmtVisitor):
return visitor.visit_if_stmt(self)
The if statement holds three components:
condition
: The expression to evaluate for truthinessthen_branch
: The statement to execute if the condition is truthyelse_branch
: The optional statement to execute if the condition is falsy (can beNone
)
Parsing If Statements
In app/parser.py
, I added parsing logic for if statements:
def statement(self):
if (self.match(TokenType.IF)):
return self.if_statement()
# ... other statement types
def if_statement(self):
self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'if'.")
condition = self.expression()
self.consume(TokenType.RIGHT_PAREN, "Expect ')' after if condition.")
then_branch = self.statement()
else_branch = None
if (self.match(TokenType.ELSE)):
else_branch = self.statement()
return IfStmt(condition, then_branch, else_branch)
The parser enforces the C-style syntax with parentheses around the condition, even though they're somewhat redundant. This follows the original Lox design from Crafting Interpreters.
Executing If Statements
In app/interpreter.py
, if statement execution is straightforward:
def visit_if_stmt(self, stmt: IfStmt):
is_cond_true = self.is_truthy(self.evaluate(stmt.condition))
if (is_cond_true):
self.execute(stmt.then_branch)
elif stmt.else_branch is not None:
self.execute(stmt.else_branch)
return None
The interpreter evaluates the condition expression and uses Lox's truthiness rules:
nil
andfalse
are falsy- Everything else is truthy
Logical Operators
Logical operators and
and or
provide a way to combine boolean expressions. Unlike arithmetic operators, they implement short-circuit evaluation - they don't always evaluate both operands.
Short-Circuit Behavior
and
: If the left operand is falsy, return it without evaluating the right operandor
: If the left operand is truthy, return it without evaluating the right operand
This behavior is crucial for both performance and correctness. Consider:
var x = nil;
if (x != nil and x.length > 0) {
// This won't crash even though x is nil
}
Logical Expression Syntax Trees
I added a Logical
expression class to app/expr.py
:
class Logical(Expr):
def __init__(self, left: Expr, operator: Token, right: Expr):
self.left = left
self.operator = operator
self.right = right
def accept(self, visitor: ExprVisitor):
return visitor.visit_logical(self)
Parsing Logical Operators
Logical operators have their own precedence level in the expression hierarchy. In app/parser.py
:
def assignment(self):
expr = self.or_()
# ... assignment logic
def or_(self):
expr = self.and_()
while (self.match(TokenType.OR)):
op = self.previous()
right = self.and_()
expr = Logical(expr, op, right)
return expr
def and_(self):
expr = self.equality()
while (self.match(TokenType.AND)):
op = self.previous()
right = self.equality()
expr = Logical(expr, op, right)
return expr
The precedence hierarchy places or
at the lowest precedence (just above assignment) and and
at higher precedence, so a or b and c
parses as a or (b and c)
.
Executing Logical Operators
The interpreter implements short-circuit evaluation:
def visit_logical(self, expr: Logical):
left = self.evaluate(expr.left)
# short-circuit
if (expr.operator.token_type == TokenType.OR):
if (self.is_truthy(left)):
return left
else:
if (not self.is_truthy(left)):
return left
return self.evaluate(expr.right)
Note that logical operators return the actual values, not just true
or false
. This allows for patterns like:
var name = user_input or "default";
While Loops
While loops provide the fundamental looping construct. They repeatedly execute a statement as long as a condition remains truthy.
While Statement Syntax Tree
I added a WhileStmt
class to app/stmt.py
:
class WhileStmt(Stmt):
def __init__(self, condition: Expr, body: Stmt):
self.condition = condition
self.body = body
def accept(self, visitor: StmtVisitor):
return visitor.visit_while_stmt(self)
Parsing While Loops
The parser handles while loops similarly to if statements:
def statement(self):
if (self.match(TokenType.WHILE)):
return self.while_statement()
# ... other statement types
def while_statement(self):
self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'while'.")
condition = self.expression()
self.consume(TokenType.RIGHT_PAREN, "Expect ')' after condition.")
body = self.statement()
return WhileStmt(condition, body)
Executing While Loops
The interpreter evaluates the condition before each iteration:
def visit_while_stmt(self, stmt: WhileStmt):
while (self.is_truthy(self.evaluate(stmt.condition))):
self.execute(stmt.body)
return None
This implementation means:
- If the condition is initially false, the loop body never executes
- The condition is re-evaluated after each iteration
- The loop continues until the condition becomes falsy
For Loops with Desugaring
For loops provide a convenient syntax for common looping patterns. Rather than implementing them as a new primitive, I used desugaring - transforming for loops into equivalent while loops during parsing.
C-Style For Loop Syntax
Lox supports C-style for loops with three clauses:
for (initializer; condition; increment) {
// body
}
Each clause is optional:
- Initializer: Variable declaration or expression (executed once before the loop)
- Condition: Expression evaluated before each iteration (defaults to
true
if omitted) - Increment: Expression executed after each iteration
Parsing and Desugaring
The for_statement()
method in app/parser.py
parses all three clauses and then transforms them into a while loop:
def for_statement(self):
self.consume(TokenType.LEFT_PAREN, "Expect '(' after 'for'.")
# initializer
if self.match(TokenType.SEMICOLON):
initializer = None
elif (self.match(TokenType.VAR)):
initializer = self.variable_declaration_statement()
else:
initializer = self.expression_statement()
# condition
condition = None
if (not self.check(TokenType.SEMICOLON)):
condition = self.expression()
self.consume(TokenType.SEMICOLON, "Expect ';' after loop condition.")
# increment
increment = None
if (not self.check(TokenType.RIGHT_PAREN)):
increment = self.expression()
self.consume(TokenType.RIGHT_PAREN, "Expect ')' after for clauses.")
# loop body
body = self.statement()
# Desugar for loop into while loop and statements
if (increment is not None):
body = BlockStmt([body, ExprStmt(increment)])
if (condition is None):
condition = Literal(True)
body = WhileStmt(condition, body)
if initializer is not None:
body = BlockStmt([initializer, body])
return body
Desugaring Process
The desugaring happens in reverse order:
- Handle increment: Wrap the body and increment in a block so the increment runs after each iteration
- Handle condition: If no condition is provided, default to
true
(infinite loop) - Create while loop: Use the condition and the (possibly modified) body
- Handle initializer: If present, wrap the initializer and while loop in a block
For example, this for loop:
for (var i = 0; i < 10; i = i + 1) {
print i;
}
Gets desugared to:
{
var i = 0;
while (i < 10) {
{
print i;
i = i + 1;
}
}
}
This approach means the interpreter doesn't need any special for loop handling - it already knows how to execute while loops and block statements.
Error Recovery with Synchronization
It's a fact that users make syntax errors, and a good parser should:
- Report the error clearly with location and context
- Continue parsing to find additional errors in the same compilation
- Avoid cascading errors where one mistake causes a flood of spurious error messages
Notice that points 2 and 3 compete against each other. The book already discussed an approach to error handling called panic mode recovery using a synchronization technique. I have not implemented it earlier but I will now.
The Problem: Cascading Errors
When the parser encounters a syntax error, there are two options: 1) Exit and report the error or 2)Report and continue parsing. We want to report all errors but not spurious errors that would go away if you fix earlier errors.
For example, if you forget a semicolon:
var a = 1 // Missing semicolon
var b = 2;
print a + b;
We should of course have the parser report the missing semicolon error, but also in a reasonable manner, look for other user errors and report them as well so the user can fix them in one-shot.
Synchronization Strategy
So we employ something called synchronization, where once we see an error, we try to restore the state of the parser so it can continue parsing looking for potentially other errors. The key insight is that certain tokens naturally mark the boundaries between statements or declarations and we use them as synchronization points: reliable places to resume parsing.
In app/parser.py
, I added a synchronize()
method:
def synchronize(self):
self.advance()
while not self.is_at_end():
if (self.previous().token_type == TokenType.SEMICOLON):
return
if (
self.peek().token_type in {
TokenType.CLASS,
TokenType.FUN,
TokenType.VAR,
TokenType.FOR,
TokenType.IF,
TokenType.WHILE,
TokenType.PRINT,
TokenType.RETURN
}
):
return
self.advance()
This method implements panic mode recovery by:
- Advancing past the error: Skip the problematic token
- Looking for statement boundaries: Stop at semicolons (end of statements)
- Looking for statement beginnings: Stop at keywords that start new statements (
var
,if
,while
, etc.) - Discarding tokens: Keep advancing until we find a synchronization point
Integration with Declaration Parsing
The synchronization is integrated into the main parsing loop in the declaration()
method:
def declaration(self):
try:
if (self.match(TokenType.VAR)):
return self.variable_declaration_statement()
return self.statement()
except ParseError as err:
self.synchronize()
return None
When a ParseError
is caught:
- The error is already reported (by the
ParseError.error()
method) synchronize()
is called to find a recovery pointNone
is returned to indicate a failed declaration- Parsing continues with the next declaration
Error State Tracking
The parser tracks whether any errors occurred during parsing using a had_error
flag:
class Parser:
def __init__(self, tokens: list[Token]):
self.tokens = tokens
self.current = 0
self.had_error = False
This flag is set to True
whenever a ParseError
is created:
@staticmethod
def error(parser, token: Token, msg: str):
parser.had_error = True
parser_error(token, msg)
return ParseError(msg)
The had_error
flag serves several important purposes:
Compilation Pipeline Control: The main interpreter checks this flag to decide whether to proceed with execution:
# In app/main.py
if statements and not parser.had_error:
interpreter = Interpreter()
interpreter.interpret(statements)
Exit Code Management: Different exit codes indicate different types of failures:
- Exit code 65: Syntax error during scanning or parsing
- Exit code 70: Runtime error during execution
- Exit code 0: Successful execution
Error vs. Recovery Distinction: Even though the parser continues after errors (thanks to synchronization), the had_error
flag ensures we know the overall parsing wasn't successful. This prevents executing potentially malformed programs while still providing comprehensive error reporting.
Example
Consider this erroneous Lox program:
var a = ; // Error: missing initializer
var b = 2; // This should parse correctly
if (b > 0 // Error: missing closing paren
print b; // This should parse correctly
var c = 3; // This should parse correctly
When we try to parse this, we get:
[line 1] Error at ';': Expect expression.
[line 4] Error at 'print': Expect ')' after if condition.
With synchronization:
- First error is reported at
var a = ;
- Parser synchronizes at the semicolon
var b = 2;
parses successfully- Second error is reported at missing
)
- Parser synchronizes at
print
keyword - Remaining statements parse successfully
Without synchronization, the parser might report cascading errors making it unclear what the real problems are.
Practical Examples
Let's see these control flow features in action. Here's a program that demonstrates if statements, logical operators, and loops:
// Factorial calculator with input validation
var n = 5;
if (n < 0) {
print "Error: factorial not defined for negative numbers";
} else {
var factorial = 1;
var i = 1;
while (i <= n) {
factorial = factorial * i;
i = i + 1;
}
print factorial ;
}
The logical operators shine in conditional expressions:
var user_name = nil;
var display_name = user_name or "Anonymous";
print "Hello, " + display_name;
var age = 25;
if (age >= 18 and age < 65) {
print "You are in the working age range";
}
Conclusion
With the addition of control flow, our Lox interpreter has reached a significant milestone: Turing completeness. This means our language can theoretically compute anything that any other programming language can compute. Our implementation demonstrates several important concepts including Desugaring and Short-circuit evaluation. The interpreter can now run substantial programs that perform real computation. While it's still missing functions, classes, and other advanced features, it has crossed the threshold from "toy calculator" to "real programming language." On to the next one!