In the introduction to building a math parser I already mentioned the Reverse Polish notation, also called “postfix notation”.
Its main advantage is unambiguity: you can simply read expression from left to right and calculate its value at the same time. You have to neither set up operators priorities nor use parenthesis. Refer to the Reverse Polish notation description for further details.
There are many implementations of Reverse Polish notation parser available, but most of them seem to be too complicated.
Writing test code is a worthwhile practice and building a parser is a good example to prove this claim. We have seen previously that the Parser class consists of many methods and each method is a part of a chain in top-down analysis. It won’t be a good idea to start writing the parser from scratch, add all methods we think necessary and then run the code for the first time.
The example implementation of a math parser is going to be written in Ruby because of its simplicity. Remember, however, that parsers are usually written in low-level languages, like C or C++. They are often located in the backbone of much bigger programs and are called frequently, so their performance has a great influence on overall system performance. Note that C code is fast, but its main disadvantage is the lack of portability (C code has to be compiled for every software platform separately).
In general, parser is a program that determines whether its input is valid, referring to the given grammar. So, if we would like to parse math expression, we have to set a formal grammar first. The most convenient way to do this is to write the context-free grammar’s production rules using EBNF (Extended Backus-Naur Form) notation. Take a look at this simple example:
digit = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" The line we wrote is called a production rule and it reads: a digit can be equal to 0 or 1 or 2… or to 9.
Everyone knows how to calculate value of a simple math expression, like "2 + 3 * 7". But not every programmer knows how to write a program that would accomplish the same task.
The first idea usually looks like this:
result = get_number() while operator = get_operator() operand = get_number() result = result operator operand end return result It means: “read the expression from left to right and perform appropriate calculations when operator occurred”.