Python

Sum First column of a File – Python Script

I keep on summarizing error logs and one of the task is to sum up all occurrences of all errors/warnings/notices etc on the hourly error log sent via email. Instead of adding them one by one, I’ve created a small script to automatically sum them all up.

UPDATE: I’ve made a JavaScript version and is available here: www.lysender.com/extra/tools/sumfirstcol. It has similar feature and is aimed for smaller files since it is just a JavaScript implementation.

Sample file

Not sure if there is a bash/awk combo equivalent for this but what I’m trying to achieve it to sum up the first column of a file since it is numeric. See below for sample error log email.

  5 PHP Warning:  mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /data/local/... on line 285
  2 PHP Warning:  mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /data/local/... on line 278
  2 PHP Warning:  mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /data/local/... on line 268
  2 PHP Notice:  Undefined index:  payment_method in /data/local/... on line 176
  1 PHP Warning:  array_key_exists() [<a href='function.array-key-exists'>function.array-key-exists</a>]: The second argument should be either an array or an object in /data/www/html/sites/... on line 358
  1 PHP Warning:  Invalid argument supplied for foreach() in /data/www/html/sites/... on line 415
  1 PHP Warning:  Invalid argument supplied for foreach() in /data/www/html/sites/... on line 395
  1 PHP Warning:  Invalid argument supplied for foreach() in /data/www/html/sites/... on line 372

As you can see, it is easy to sum it up. However, the error logs are usually hundred of lines or more. Therefore, I’ve created a simple script to sum up the first numeric column. The result is the grand total of all error occurrences. Below is the Python script. (Not that I don’t know how to write it in PHP, I just wanted to practice more on Python).

#!/usr/bin/python

import sys
import os
import string


def parse_file(filename):
    try:
        f = open(filename)
        parse_now(f)
        f.close()
    except IOError as e:
        print 'Unable to open file %s' % filename
        print e

def parse_now(f):
    total = 0
    lines = 0
    for line in f:
        lines = lines + 1
        chunks = line.strip().split(' ', 2)
        n = chunks[0]

        if n.isdigit():
            total = total + int(n)

    print 'Total lines: %d' % lines
    print 'First col sum: %d' % total


if __name__ == '__main__':
    input_file = None

    if (len(sys.argv) == 2):
        input_file = sys.argv[1]

        if os.path.isfile(input_file):
            parse_file(input_file)
        else:
            print 'Input file does not exists'
    else:
        print 'sum-first-col <file>'

Save this script as sum-first-col for example and put it on your environment path. Be sure to put an execute bit to be able to run it as a script directly on your terminal. You may run it directly or by passing it to python executable.

# This style
sum-first-col sample-error-log.txt
# Or by this style
python sum-first-col sample-error-log.txt

And below is a sample output.

lysender@darkstar:~$ sum-first-col sample-error-log.php 
Total lines: 8
First col sum: 15
lysender@darkstar:~$ 

Note: I’m trying to create a JavaScript version and publish it on the web so it would be a copy and paste job instead of running stuff on the terminal. Let’s see.

Update: Here is the JavaScript version.

Enjoy.

Leave a reply

Your email address will not be published. Required fields are marked *