Informatica Developers and Administrator Tips: Tips on Handling Log File

1. How to get alert by reading log files

Many people write shell scripts to read and parse server log files, such as Oracle alert.log, Apache error_log..., and email them when certain strings in the log are found. The scripts are usually run as a cron job. That may incur a lot of disk I/O especially when the file is already large, and depending how often the cron job runs, it may not send notification as quickly as you would like.
These shortcomings can be easily overcome. Try tail -f on the log file and pipe the result to an action command. tail -f reads the tail of the file once a second and sleeps in between. That's close to "immediate" response and causes much less I/O and CPU usage. Instead of a cron job, all you need to do is type a command like this (replace mailx with mail as needed)

#Serious Oracle database error

nohup tail -f alert_ORCL.log nohup perl -nle 'system("mailx -s 'ORCL error' youremail < /dev/null") if /^ORA-00600/' &

But it may make more sense to put the command sequence in a script and worry about nohup at the time you launch it. So create a file myscript which has #Apache sends status 200 to user but not followed by data (i.e. "-" at line end)

tail -f access_log perl -nle 'system("mailx -s ChkApache you\@somecompany.com < /dev/null") if / 200 -$/'

and type the command nohup myscript &. (This is better than myscript having nohup tail... nohup perl... and you type myscript &).
You can also let it email to you the line in the log instead of a blank message

tail -f access_log perl -nle 'system("echo \"$_\" mailx -s ChkApache you\@somecompany.com him\@somecompany.com") if / 200 -$/'

A more sophisticated example that skips some errors you don't care

#!/bin/ksh
#Usage: $0 OracleSID
export SID=$1
tail -f alert_$SID.log perl -nle '$SID=$ENV{SID}; system("echo \"$_\" mailx -s \"$SID Database Error\" you\@somecompany.com") if (/^ORA-/ and !/^ORA-07445/)'

Instead of a Perl one-liner, your myscript can call a full-fledged script whattodo.pl with fairly complex programming logic in ittail -f /var/adm/messages perl -nl whattodo.pl
If you want to avoid creating nohup.out and getting "send output to nohup.out" message, run the command in csh or redirect stdout to /dev/null, or run it with the command echo 'nohup myscript &' csh -s. If you're more comfortable with awk, replace perl with nawk or gawk (but not awk which has a broken system command)

tail -f alert_ORCL.log nawk '/^ORA-00600/ {system("mailx -s 'ORCL error' youremail < /dev/null")}'

A cron job is preserved across reboot. But your tail... perl... command is not. So you may have to start it from one of your rc scripts.
The disadvantage of using this method to send alert is that it processes one line at a time; if you have 100 lines that match the pattern, you'll get 100 emails. While the traditional scan-the-whole-logfile approach easily groups the lines into one alert, you have to do a little programming here to achieve the same goal, probably by writing a temporary file followed by a few seconds wait, or setting a variable right in your code and wait some time before sending email.

By the way, on Solaris, you can check whether the job is indeed ignoring SIGHUP with command

/usr/proc/bin/psig pid head -2, where pid is the process ID of the submitted background job.
-->
Microsoft's Windows Services for UNIX: Heavy-duty. Comprehensive emulation of UNIX environment including /proc filesystem, truss utility. Completely run in Windows POSIX subsystem. Does not install on XP Home Edition.
Redhat's cygwin: Very popular.
MKS Software's MKS Toolkit
Hamilton Lab's Hamilton C Shell

2. How to read a large file

If you have a several hundred MB log file, how do you quickly find a portion of it you're interested in? Your vi may not be able to open the file, complaining file too large, and/or take too long and use too much memory. Here's what I usually do.
Let's say you want Mon Jul 15's data from July's access_log,

here's the commandsed -n '/^Mon Jul 15/p; /^Tue Jul 16/q' access_log > /tmp/qq

Without the quit command, sed would scan the file to the end or till you press ^C.

Note the undocumented semicolon that allows you to put two sed commands on one line. (By the way, on Solaris creating a file under /tmp may be faster because tmpfs should be memory-based, unless you're short on RAM. But remember to delete big files under /tmp when you're done because they reduce available swap space.)

If you know the data you want is not too far from the end tail -1000 access_log head -1

If you still see lines too late, increase 1000 to 5000, 10000 and so on. But it should not be too close to the head of the file. You should always have an idea of when the file starts and when it stops. If you know it only contains records for July, then it's of course July 1 to July 31. But even then you have to take into account very uneven data distribution.

If you don't know the start and end, do run

head -1 access_log and tail -1 access_log first.

Unless you have to, do not start with head as in head -1000 access_log tail.

That would actually print 1000 lines to the pipe reader which only cares about the last 10 lines. Count the total lines (wc -l access_log) only if you have to. And generally do not split the file into chunks (split -b 10000000 access_log).

Needless to say, pattern match should use the most restrictive regular expression when dealing with large files. This means that if you search for all entries in Apache access_log for the client 172.168.123.45 and you know the IP is at line beginning,

you should use /^172\.168\.123\.45/ instead of /172.168.123.45/.

grep may be faster than grep, which is in turn faster than fgrep. Therefore either of these two commands are OK

egrep '^172\.168\.123\.45' access_log
sed -n '/^172\.168\.123\.45/p' access_log

Informatica Developers and Administrator Tips

Friday, April 11, 2008

Tips on Handling Log File

No comments:

Blog Archive

About Me