I just have wrote about Extract part of FreeRadius” log with a little awk script. Then I decided that it whould be easier and quicker than with Python.
Here is a Python script, which does the same (and written in the same way):
#!/usr/bin/python # # import sys, re pattern = sys.argv[1] file = open(sys.argv[2]) cp = re.compile(pattern) total = 0 selected = 0 good = False lines = 0 set = [] while True: line = file.readline() if not line: break lines += 1 set.append(line) if cp.search(line): good = True if line == '\n': if good: print ''.join(set), selected += 1 good = False set = [] total += 1 sys.stderr.write("%i records (%i lines) processed\n" %(total, lines)) sys.stderr.write("%i records matched\n" % selected) sys.stderr.write("Pattern was: '%s'\n\n" % pattern) |
Nothing special, you see.
What I considered interesting in this script? — it works near 30% faster than awk. And I don’t know how to optimize my awk script :-)
Take a look, this is awk:
time awk -f cutlog.awk pattern='Station-Id = \"XXXYYZ[0-2]\"' detail-YYYYMMDD > detail-YYYYMMDD.part 276358 records (6874776 lines) processed 49574 records matched Pattern was: 'Station-Id = "XXXYYZ[0-2]"' 33.90user 0.29system 0:34.19elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+306minor)pagefaults 0swaps |
This is Python:
time python cutlog.py 'Station-Id = "XXXYYZ[0-2]"' detail-YYYYMMDD > detail-YYYYMMDD.part 276358 records (6874776 lines) processed 49574 records matched Pattern was: 'Station-Id = "XXXYYZ[0-2]"' 26.60user 0.24system 0:26.85elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+732minor)pagefaults 0swaps |
update:
Well, I have tried to «optimize»:
# замість: if cp.search(line): good = True # написати: if not good and cp.search(line): good = True continue # і аналогічно для awk. |
There is no significant difference. And no significant difference when to change order of checking (first «if line is empty» and then «if line matches» or otherwise), use print
instead of printf
etc.