Script to process apache log file to fight spammers / DDOS attackers

By webmaster | 23rd December 2013 | 2 min read

One of the challenges with working with dynamic websites is that you have to keep fighting malicious users who regularly sap your server capacity with rogue crawling of your site. To do this you would have to monitor and analyze the traffic patterns on the server regularly. You would definitely want to do this when you have load spikes on the server and you wish to find out the IP addresses, user agents and the specific URLs which resulted in these spikes. This is all the more relevant in Drupal sites where a rogue bot can take down the site when proper DDOS mechanisms are not set in place.

The latest copy of the script can be downloaded from

https://github.com/zyxware/misc-utils/tree/master/ls-httpd

You can copy the script to /usr/local/bin or into some folder which is in your $PATH variable on the server. Remember to configure the script with the path to your apache access log. You can update the default value of the variable log_file to wherever your apache log file is located. Also do note that the script was written for the specific log file format used in our servers. You might want to tweak the awk parameters if your apache log file uses a different format.

Alternatively you can use the following as your apache log file format in apache.conf

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

and then ensure that your log format is set up as combined in your virtualhost configuration

CustomLog ${APACHE_LOG_DIR}/access.log combined

The following are some example usage patterns

ls-httpd url 1000

will find top URLs in the last 1000 access log entries

ls-httpd ip 1000

will find top IPs in the last 1000 access log entries

ls-httpd agent 1000

will find top user agents in the last 1000 access log entries

ls-httpd url 17:

will find top URLs from 17:00:00 to 17:59:59

ls-httpd url 17:2

will find top URLs from 17:20:00 to 17:29:59

ls-httpd url 17:21

will find top URLs from 17:21:00 to 17:21:59

ls-httpd url 17

will find top URLs in the last 17 access log entries :-)

Hope you find this useful.