The example implementation of a math parser is going to be written in Ruby because of its simplicity. Remember, however, that parsers are usually written in low-level languages, like C or C++. They are often located in the backbone of much bigger programs and are called frequently, so their performance has a great influence on overall system performance. Note that C code is fast, but its main disadvantage is the lack of portability (C code has to be compiled for every software platform separately).
Now we will go back to Ruby. As we have said in the previous part, it is a good idea to split parser into two parts: a lexer and a parser. We’ll start from creating the lexer as it is the basic part.
Lexer’s aim is to transform the input into a sequence of tokens. It will be convenient to create two classes:
Token. Lexer contains the
get_next_token() method, which will be called by the parser.
get_next_token() reads some characters from the input, recognizes them as a token (or it doesn’t) and returns an instance of the
Token class looks like following:
class Token Plus = 0 Minus = 1 Multiply = 2 Divide = 3 Number = 4 LParen = 5 RParen = 6 End = 7 attr_accessor :kind attr_accessor :value def initialize @kind = nil @value = nil end def unknown? @kind.nil? end end
At the beginning of the class, there are available token types defined. They are usually put inside an enum-like variable, but unfortunately Ruby doesn’t provide any enum type.
kind attribute holds the information about token type (
Divide etc.). One may be wondering what is
value attribute defined for. Well, if the token is a number,
value attribute holds the number’s value. Otherwise,
value will just hold the
nil value. Perhaps creating
NumberToken class extending
Token would be a more elegant and flexible solution, but we won’t do it now as we want to focus on the main problem.
Token class is ready, so we can move on to the
class Lexer def initialize(input) @input = input @return_previous_token = false end def get_next_token if @return_previous_token @return_previous_token = false return @previous_token end token = Token.new @input.lstrip! case @input when /\A\+/ then token.kind = Token::Plus when /\A-/ then token.kind = Token::Minus when /\A\*/ then token.kind = Token::Multiply when /\A\// then token.kind = Token::Divide when /\A\d+(\.\d+)?/ token.kind = Token::Number token.value = $&.to_f when /\A\(/ token.kind = Token::LParen when /\A\)/ token.kind = Token::RParen when '' token.kind = Token::End end raise 'Unknown token' if token.unknown? @input = $' @previous_token = token token end def revert @return_previous_token = true end end
At creation time, the
Lexer instance is being bound to the input.
get_next_token() method was already mentioned before. It reads characters from the input and tries to recognize them as one of the token types. If it fails, it raises a
case statement takes advantage of one of the Ruby’s nice features. In a when clause, object can be compared not only to a simple constant value; regular expressions and ranges can be used as well. Ale the object has to do is to respond to the
One more thing that requires an explanation is the
revert() method. It is likely to happen that when we are reading tokens by calling the
get_next_token() method, we will go too far. In such a situation, we would like to have the ability to “unget” the last token. Otherwise, we would have to store the token somewhere and refer to it when the time comes. It would be inconvenient from the
Lexer’s client point of view and that is why the
revert() method has come to be.
The time has come to show the
Parser class. It’s a big bunch of code. Don’t worry though, we are going to explain it method by method.
require 'lexer' class Parser def parse(input) @lexer = Lexer.new(input) expression_value = expression token = @lexer.get_next_token if token.kind == Token::End expression_value else raise 'End expected' end end protected def expression component1 = factor additive_operators = [Token::Plus, Token::Minus] token = @lexer.get_next_token while additive_operators.include?(token.kind) component2 = factor if token.kind == Token::Plus component1 += component2 else component1 -= component2 end token = @lexer.get_next_token end @lexer.revert component1 end def factor factor1 = number multiplicative_operators = [Token::Multiply, Token::Divide] token = @lexer.get_next_token while multiplicative_operators.include?(token.kind) factor2 = number if token.kind == Token::Multiply factor1 *= factor2 else factor1 /= factor2 end token = @lexer.get_next_token end @lexer.revert factor1 end def number token = @lexer.get_next_token if token.kind == Token::LParen value = expression expected_rparen = @lexer.get_next_token raise 'Unbalanced parenthesis' unless expected_rparen.kind == Token::RParen elsif token.kind == Token::Number value = token.value else raise 'Not a number' end value end end
To parse the expression, client should invoke the
parse() method. Its only parameter is the input string. It creates a
Lexer instance and tries to evaluate given math expression.
number() methods come almost directly from the grammar we set before. Parentheses handling has been moved from the
factor() to the
number() method, so the former is not overloaded with code and, as a result, it’s easier to read.
expression() method expects one or more components, separated with additive operators (
component can consist of factors, so every component is a result of the
factor() method invocation. Parser goes inwards first, splits the expression into basic parts and then gather the partial results together. That’s it, our parser is recursive. One can also call it a “top-down” parser.
factor() method is similar to the
expression() method. The
number() method is more interesting. It expects either a number or a whole expression enclosed in parentheses. If it encounters a number, the matter is simple. The show begins when there is a left parenthesis; a valid expression and a right parenthesis is then expected. Again, if
Parser detects a syntax error, it raises a
Example usage of the
Parser class looks like this:
require 'parser' parser = Parser.new loop do begin print '>> ' puts parser.parse(gets) rescue RuntimeError puts 'Error occurred: ' + $! end end
User can interact with the parser by using command line interface. The script displays either expression value or an error info. Of course, the
Parser class can be used in another environment, e.g. in a web application.
We can consider the parser finished, but it’s always a good idea to verify if the piece of software we are creating is correct. Let’s talk about testing for a while.