Advent of Code 2020 - Day 18

One aspect of computer science that fascinates me is programming language theory. It’s so cool that text in a file can make a computer do something. I’ve studied a good deal about parsers, interpreters, and compilers, so when I read the problem for day 18 on 2020’s Advent of Code, I got excited. Spoiler: this post has a lot of Rust code!

The Problem - Part 1

The challenge is to evaluate expressions consisting of addition, multiplication, and parentheses. The twist being the precedences for adding and multiplying are the same: 5 + (6 + (3 * 1)) + 9 * 2 = 46. I immediately thought, “Oh, should I write a Pratt parser?”. Due to the simple left to right precedence, I decided on a straightforward recursive algorithm to find the products and sums while taking into account groupings by parentheses. First working iteration:

fn calculate(chars: &[char]) -> u64 { // the full list of characters
    let mut lhs: u64 = 0; // left hand side
    let mut pointer = 0;

    while pointer < chars.len() {
        let mut seeker = pointer + 1;
        let ch = chars[pointer];
        match ch.to_digit(10) {
            Some(num) => {
                // if the char is a number, set it to lhs
                lhs = num as u64;
                pointer += 1;
            }
            None => {
                // find left hand side
                if ch == '(' {
                    seeker = consume_groups(chars, seeker); // find closing ')'
                    lhs = calculate(&chars[pointer + 1..seeker - 1]);
                    pointer = seeker;
                    continue;
                }

                // find right hand side
                let next_char = chars[seeker];
                let rhs = match next_char.to_digit(10) {
                    Some(num) => num as u64,
                    None => match next_char {
                        '(' => {
                            seeker = consume_groups(chars, seeker);
                            calculate(&chars[pointer + 2..seeker])
                        }
                        _ => panic!("no rhs next char match"),
                    },
                };

                // char must be an operator
                // calculate with same precedence and set it as lhs
                match ch {
                    '+' => lhs += rhs,
                    '*' => lhs *= rhs,
                    _ => panic!("operator is wrong"),
                }
                pointer = seeker + 1;
            }
        }
    }
    lhs
}

As I was implementing that solution, something occurred to me based on my experience solving the previous seventeen AOC problems. “I bet Part 2 is going to be about changing the precedences somehow.” That would invalidate my entire algorithm since it only goes left to right, executing operators as it finds them. At that point, I would need a genuine Pratt parser that would cover both parts of the problem and let me change precedence with ease. Well, I won the bet.

Part 2

The second half of the problem asks for the same process, but this time addition has a higher precedence than multiplication. A Pratt parser makes setting precedences on infix and prefix operators trivial, although the parser itself can be hard to grok. First, we need a lexer and some tokens to play with.

enum Token {
    Int(char),
    Op(char),
    Eof,
}

struct Lexer {
    tokens: Vec<Token>,
}

impl Lexer {
    fn new(input: &str) -> Lexer {
        // turn chars into tokens
        let mut tokens: Vec<Token> = input
            .chars()
            .filter(|&ch| ch != ' ')
            .map(|ch| match ch {
                '0'..='9' => Token::Int(ch),
                _ => Token::Op(ch),
            })
            .collect();
        tokens.reverse(); // using the end of the vector is easier
        Lexer { tokens }
    }

    fn next(&mut self) -> Token {
        self.tokens.pop().unwrap_or(Token::Eof)
    }
    fn peek(&mut self) -> Token {
        self.tokens.last().copied().unwrap_or(Token::Eof)
    }
}

The following is a Pratt parser that works for both parts of the problem:

// new function creating a Lexer and evaluating with 0 precedence
fn calculate(input: &str) -> u64 {
    let mut lexer = Lexer::new(input);
    evaluate(&mut lexer, 0)
}

fn evaluate(lexer: &mut Lexer, precedence: u8) -> u64 {
    // find lhs
    let mut lhs = match lexer.next() {
        Token::Int(ch) => ch.to_digit(10).unwrap() as u64,
        Token::Op('(') => {
            let lhs = evaluate(lexer, 0);
            lexer.next();
            lhs
        }
        t => panic!("you broke it: {:?}", t),
    };

    loop {
        // find operator
        let op = match lexer.peek() {
            Token::Eof => break,
            Token::Op(op) => op,
            t => panic!("fix your token already: {:?}", t),
        };

        // the magic
        if let Some((lbp, rbp)) = infix_binding_power(op) {
            if lbp < precedence {
                break;
            }
            lexer.next();
            let rhs = evaluate(lexer, rbp); // find rhs
            match op { // calculate for reals
                '+' => lhs += rhs,
                '*' => lhs *= rhs,
                _ => panic!("what are you trying to do here? {}", op),
            }
            continue;
        }
        break;
    }
    lhs
}

The magic of infix binding power

You can think of precedence in the order of operations (PEMDAS) as the power to bind two operands. With 1 + 6 + 4, the operators have the same precedence but different binding power. The right operand is bound more tightly than the left.

1  +  6  +  4
  1 2   1 2   - binding powers

This says that the first addition operator will be the first to execute because 6 is bound tighter to the first operator than to the second (2 > 1). This binding is implemented in code with the lbp < precedence condition. In the above example, the precedence starts at zero, the LHS becomes 1, and RHS is evaluated with precedence 2. Since the left binding power of the following operator (1) is lower than the current precedence, RHS evaluation stops and 6 is returned.

Then add multiplication to the expression, which has a higher precedence and thus binding power:

1  +  2  *  3  +  4
  1 2   3 4   1 2

The 3 operand is drawn to the * more than any other operand to their operator. Then it’s multiplied by 2, the next highest. And now it’s reduced to the state of the previous expression. Once we map the operators to their binding powers, we have a precedence adhering parser/interpreter! This allows us to arbitrarily change precedence in one spot to find the answers for both parts of the problem.

fn infix_binding_power(op: char) -> Option<(u8, u8)> {
    match op {
        // Use this to satisfy Part 1
        '+' | '*' => Some((1, 2)), // same precedence

        // Use this to satisfy Part 2
        '*' => Some((1, 2)),
        '+' => Some((3, 4)), // higher precedence due to stronger binding power

        _ => None,
    }
}

Final Summation

The Pratt parser is an ingenious algorithm. In a few lines of code, any operator expression with precedence can be handled and new operators added with ease. That’s pretty powerful.