Extract part of FreeRadius’ log — Python

Posted on

I just have wrote about Extract part of FreeRadius” log with a little awk script. Then I decided that it whould be easier and quicker than with Python.

Here is a Python script, which does the same (and written in the same way):

#!/usr/bin/python
#
#
 
import sys, re
 
pattern = sys.argv[1]
file = open(sys.argv[2])
 
cp = re.compile(pattern)
 
total = 0
selected = 0
good = False
lines = 0
 
set = []
 
while True:
    line = file.readline()
    if not line:
        break
 
    lines += 1
 
    set.append(line)
 
    if cp.search(line):
        good = True
 
    if line == '\n':
        if good:
            print ''.join(set),
            selected += 1
 
        good = False
        set = []
        total += 1
 
sys.stderr.write("%i records (%i lines) processed\n" %(total, lines))
sys.stderr.write("%i records matched\n" % selected)
sys.stderr.write("Pattern was: '%s'\n\n" % pattern)

Nothing special, you see.

What I considered interesting in this script? — it works near 30% faster than awk. And I don’t know how to optimize my awk script :-)

Take a look, this is awk:

time awk -f cutlog.awk pattern='Station-Id = \"XXXYYZ[0-2]\"' detail-YYYYMMDD > detail-YYYYMMDD.part
276358 records (6874776 lines) processed
49574 records matched
Pattern was: 'Station-Id = "XXXYYZ[0-2]"'
 
33.90user 0.29system 0:34.19elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+306minor)pagefaults 0swaps

This is Python:

time python cutlog.py 'Station-Id = "XXXYYZ[0-2]"' detail-YYYYMMDD > detail-YYYYMMDD.part
276358 records (6874776 lines) processed
49574 records matched
Pattern was: 'Station-Id = "XXXYYZ[0-2]"'
 
26.60user 0.24system 0:26.85elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+732minor)pagefaults 0swaps

update:
Well, I have tried to «optimize»:

    # замість:
    if cp.search(line):
        good = True
    # написати:
    if not good and cp.search(line):
        good = True
        continue
# і аналогічно для awk.

There is no significant difference. And no significant difference when to change order of checking (first «if line is empty» and then «if line matches» or otherwise), use print instead of printf etc.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.