Syntax, again
We have defined a syntax for our configuration files as follows:
classes ::= rootdef classdefs rootdef ::= rootclass { params } classdefs ::= classdef classdefs ::= classdefs classdef classdef ::= ordclass { params } params ::= ratedef params ::= ceildef params ::= descrdef params ::= params params ratedef ::= rate = number ceildef ::= ceil = number descrdef ::= descr = string rootclass ::= class classid root ordclass ::= class classid parent classid |
Remember, that we have defined basic types («rate
«, «=
«, «classid
» etc) previously. When we scanned the configuration.
Parsing
The very nice thing about SPARK is a placing syntax definition in a docstrings (you remember that we defined regex’es for scanning in a docstrings, too).
#!/usr/bin/env python import spark import ast class SimpleParser(spark.GenericParser): def __init__(self, start='classes'): spark.GenericParser.__init__(self, start) def p_classes(self, args): """ classes ::= rootdef classdefs """ return ast.AST(type='classes', kids=args) def p_rootdef(self, args): """ rootdef ::= rootclass { params } """ a = ast.AST('root') a.classid = args[0][1] a.params = args[2] return a def p_classdefs(self, args): """ classdefs ::= classdef classdefs ::= classdefs classdef """ a = ast.AST(type='classdefs', kids=args) return a def p_classdef(self, args): """ classdef ::= ordclass { params } """ a = ast.AST('classdef') a.classid = args[0][1] a.parent = args[0][3] a.params = args[2] return a def p_params(self, args): """ params ::= ratedef params ::= ceildef params ::= descrdef params ::= params params """ ret = [] if len(args) > 1: for p in args: ret.extend([x for x in p]) return ret return args def p_rate(self, args): """ ratedef ::= rate = number """ args[0].value = args[2] return args[0] def p_ceil(self, args): """ ceildef ::= ceil = number """ args[0].value = args[2] return args[0] def p_descr(self, args): """ descrdef ::= descr = string """ args[0].value = args[2] return args[0] def p_rootclass(self, args): """ rootclass ::= class classid root """ return args def p_ordclass(self, args): """ ordclass ::= class classid parent classid """ return args |
(Again, ast.AST
is hardly a lot different of provided by John Aycock, author of SPARK).
A couple of notes here:
- Some methods return
ast.AST
instance, while some other — just arguments passed to them. This is because we do not want to have an AST tree with every lexical element as a tree node. We want to have only «classes
«, «root
«, «classdefs
» and «classdef
» as a nodes of our tree. - Those methods, which returns just arguments, passed to them, in fact only pass arguments to above nodes (remember, SPARK processes AST bottom-up).
- As far as both «
classdefs
» and «classdef
» may have «params
«,p_params
method may take lists of lists and lists inside lists as an argument. To pass to above nodes only token.Token instances, we need to «re-fold» those lists. - You can insert
print args
in every method to investigate what is «args
» in every case; then those «args[0][1]
» will make more sense.
So, what we have now? Take a look:
>>> parser = parser.SimpleParser() >>> parsed = parser.parse(scanned) >>> print "Got %s of type '%s'." % (parsed, parsed.type) Got <ast.AST instance at 0xb7c5166c> of type 'classes'. >>> print "It has kids:\n\t %s." % \ ... ',\n\t'.join ([ '%s of type %s' % (k, k.type) for k in parsed.kids ]) It has kids: <ast.AST instance at 0xb7c519cc> of type root, <ast.AST instance at 0xb7c5170c> of type classdefs. |
Now i do believe that this is an AST (Abstract Syntax Tree) for our configuration file (written in our configuration language).
Making syntax errors
Let’s remove «root
» in 3th line and test again. We will have a message:
Syntax error at or near `{' token |
Similary, if we change «ceil
» to «cello
» we will have:
Syntax error at or near `cello' token |
Similary, if we change «1024
» to «1024foo
» we will have:
Syntax error at or near `foo' token |
Similary, if we change «1024
» to «foo moo bar
» (our error, which scanner can not recognize) we will have:
Syntax error at or near `foo' token |
You can see that SPARK does not report line numbers, where errors occur. Read SPARK’s tutorial how to do that.
And more of that, we may try to write like this:
class 1:5 root { rate = 10240 ceil = 20480 } class 1:50 root { ceil = 2048 rate = 1024 } |
Surely, this will be an «error at or near `root' token
» — our syntax allows only one «root
» definition. But what if we write the code like this one?
class 1:5 root { rate = 10240 ceil = 20480 } class 1:50 parent 1:7 { ceil = 2048 rate = 1024 } |
We specified «parent 1:7
» which does not exist; this is, probably, will lead to an error. From HTB’s point of view.
But parser can not recognize this error. This is a task for the next step — Semantic Analysis.