Parsing

Syntax, again

We have defined a syntax for our configuration files as follows:

    classes ::= rootdef classdefs
    rootdef ::= rootclass { params }
    classdefs ::= classdef
    classdefs ::= classdefs classdef
    classdef ::= ordclass { params }
    params ::= ratedef
    params ::= ceildef
    params ::= descrdef
    params ::= params params
    ratedef ::= rate = number
    ceildef ::= ceil = number
    descrdef ::= descr = string
    rootclass ::= class classid root
    ordclass ::= class classid parent classid

Remember, that we have defined basic types («rate«, «=«, «classid» etc) previously. When we scanned the configuration.

Parsing

The very nice thing about SPARK is a placing syntax definition in a docstrings (you remember that we defined regex’es for scanning in a docstrings, too).

#!/usr/bin/env python
 
import spark
import ast
 
class SimpleParser(spark.GenericParser):
    def __init__(self, start='classes'):
        spark.GenericParser.__init__(self, start)
 
    def p_classes(self, args):
        """ classes ::= rootdef classdefs """
        return ast.AST(type='classes', kids=args)
 
    def p_rootdef(self, args):
        """ rootdef ::= rootclass { params } """
        a = ast.AST('root')
        a.classid = args[0][1]
        a.params = args[2]
        return a
 
    def p_classdefs(self, args):
        """
            classdefs ::= classdef
            classdefs ::= classdefs classdef
        """
        a = ast.AST(type='classdefs', kids=args)
        return a
 
    def p_classdef(self, args):
        """ classdef ::= ordclass { params } """
        a = ast.AST('classdef')
        a.classid = args[0][1]
        a.parent = args[0][3]
        a.params = args[2]
        return a
 
    def p_params(self, args):
        """
            params ::= ratedef
            params ::= ceildef
            params ::= descrdef
            params ::= params params
        """
        ret = []
        if len(args) > 1:
            for p in args:
                ret.extend([x for x in p])
            return ret
 
        return args
 
 
    def p_rate(self, args):
        """ ratedef ::= rate = number """
        args[0].value = args[2]
        return args[0]
 
    def p_ceil(self, args):
        """ ceildef ::= ceil = number """
        args[0].value = args[2]
        return args[0]
 
    def p_descr(self, args):
        """ descrdef ::= descr = string """
        args[0].value = args[2]
        return args[0]
 
    def p_rootclass(self, args):
        """ rootclass ::= class classid root """
        return args
 
    def p_ordclass(self, args):
        """ ordclass ::= class classid parent classid """
        return args

(Again, ast.AST is hardly a lot different of provided by John Aycock, author of SPARK).

A couple of notes here:

  1. Some methods return ast.AST instance, while some other — just arguments passed to them. This is because we do not want to have an AST tree with every lexical element as a tree node. We want to have only «classes«, «root«, «classdefs» and «classdef» as a nodes of our tree.
  2. Those methods, which returns just arguments, passed to them, in fact only pass arguments to above nodes (remember, SPARK processes AST bottom-up).
  3. As far as both «classdefs» and «classdef» may have «params«, p_params method may take lists of lists and lists inside lists as an argument. To pass to above nodes only token.Token instances, we need to «re-fold» those lists.
  4. You can insert print args in every method to investigate what is «args» in every case; then those «args[0][1]» will make more sense.

So, what we have now? Take a look:

>>> parser = parser.SimpleParser()
>>> parsed = parser.parse(scanned)
>>> print "Got %s of type '%s'." % (parsed, parsed.type)
 
Got <ast.AST instance at 0xb7c5166c> of type 'classes'.
 
>>> print "It has kids:\n\t %s." % \
...        ',\n\t'.join ([ '%s of type %s' % (k, k.type) for k in parsed.kids ])
 
It has kids:
         <ast.AST instance at 0xb7c519cc> of type root,
        <ast.AST instance at 0xb7c5170c> of type classdefs.

Now i do believe that this is an AST (Abstract Syntax Tree) for our configuration file (written in our configuration language).

Making syntax errors

Let’s remove «root» in 3th line and test again. We will have a message:

Syntax error at or near `{' token

Similary, if we change «ceil» to «cello» we will have:

Syntax error at or near `cello' token

Similary, if we change «1024» to «1024foo» we will have:

Syntax error at or near `foo' token

Similary, if we change «1024» to «foo moo bar» (our error, which scanner can not recognize) we will have:

Syntax error at or near `foo' token

You can see that SPARK does not report line numbers, where errors occur. Read SPARK’s tutorial how to do that.

And more of that, we may try to write like this:

class 1:5 root {
        rate = 10240
        ceil = 20480
}
 
class 1:50 root {
        ceil = 2048
        rate = 1024
}

Surely, this will be an «error at or near `root' token» — our syntax allows only one «root» definition. But what if we write the code like this one?

class 1:5 root {
        rate = 10240
        ceil = 20480
}
 
class 1:50 parent 1:7 {
        ceil = 2048
        rate = 1024
}

We specified «parent 1:7» which does not exist; this is, probably, will lead to an error. From HTB’s point of view.

But parser can not recognize this error. This is a task for the next step — Semantic Analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.