Linux Filter log file entries based on date range

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7706095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 06:34:28  来源:igfitidea点击:

Filter log file entries based on date range

linuxapacheubuntuawk

提问by sqren

My server is having unusually high CPU usage, and I can see Apache is using way too much memory. I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find him?

My server is having unusually high CPU usage, and I can see Apache is using way too much memory. I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find him?

I've used the following line, to find the 10 most "active" IPs:

I've used the following line, to find the 10 most "active" IPs:

cat access.log | awk '{print }' |sort  |uniq -c |sort -n |tail

The top 5 IPs have about 200 times as many requests to the server, as the "average" user. However, I can't find out if these 5 are just very frequent visitors, or they are attacking the servers.

The top 5 IPs have about 200 times as many requests to the server, as the "average" user. However, I can't find out if these 5 are just very frequent visitors, or they are attacking the servers.

Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?

Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?

Cheers!

Cheers!

UPDATED 23 OCT 2011 - The commands I needed:

UPDATED 23 OCT 2011 - The commands I needed:

Get entries within last X hours [Here two hours]

Get entries within last X hours [Here two hours]

awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ( > Date) print Date FS }' access.log

Get most active IPs within the last X hours [Here two hours]

Get most active IPs within the last X hours [Here two hours]

awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ( > Date) print }' access.log | sort  |uniq -c |sort -n | tail

Get entries within relative timespan

Get entries within relative timespan

awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ( > Date &&  < Date2) print Date FS Date2 FS }' access.log

Get entries within absolute timespan

Get entries within absolute timespan

awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ( > Date &&  < Date2) print 
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ( > Date &&  < Date2) print }' access.log | sort  |uniq -c |sort -n | tail
}' access.log

Get most active IPs within absolute timespan

Get most active IPs within absolute timespan

awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' > Date {print Date, 
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' > Date {print }' | sort  |uniq -c |sort -n | tail
}' access_log

采纳答案by matchew

yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.

yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.

awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' > Date &&  < Date2 {print Date, Date2, } access_log'

assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours'or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23

assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours'or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23

So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)

So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)

#!/usr/bin/perl -ws
# This script parse logfiles for a specific period of time

sub usage {
    printf "Usage: %s -s=<start time> [-e=<end time>] <logfile>\n";
    die $_[0] if $_[0];
    exit 0;
}

use Date::Parse;

usage "No start time submited" unless $s;
my $startim=str2time($s) or die;

my $endtim=str2time($e) if $e;
$endtim=time() unless $e;

usage "Logfile not submited" unless $ARGV[0];
open my $in, "<" . $ARGV[0] or usage "Can't open '$ARGV[0]' for reading";
$_=<$in>;
exit unless $_; # empty file
# Determining regular expression, depending on log format
my $logre=qr{^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+)};
$logre=qr{^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\]} unless /$logre/;

while (<$in>) {
    /$logre/ && do {
        my $ltim=str2time();
        print if $endtim >= $ltim && $ltim >= $startim;
    };
};

If you wanted to use a range specify two date variables and construct your expression appropriately.

If you wanted to use a range specify two date variables and construct your expression appropriately.

so if you wanted do find something between 2-4hrs ago your expression might looks something like this

so if you wanted do find something between 2-4hrs ago your expression might looks something like this

./timelapsinlog.pl -s=09:18 -e=09:24 /path/to/logfile

Here is a question I answered regarding dates in bash you might find helpful. Print date for the monday of the current week (in bash)

Here is a question I answered regarding dates in bash you might find helpful. Print date for the monday of the current week (in bash)

回答by F. Hauri

As this is a commonperltask

As this is a commonperltask

And because this is not exactly same than extract last 10 minutes from logfilewhere it's about a bunch of time upto the end of logfile.

And because this is not exactly same than extract last 10 minutes from logfilewhere it's about a bunch of time upto the end of logfile.

And because I've needed them, I (quickly) wrote this:

And because I've needed them, I (quickly) wrote this:

./timelapsinlog.pl -s='2017/01/23 09:18:12' /path/to/logfile

This could be used like:

This could be used like:

^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+)         # ^Jan  1 01:23:45
^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\]    # ^... [01/Jan/2017:01:23:45 +0000]

for printing logs between 09h18 and 09h24.

for printing logs between 09h18 and 09h24.

cat <FILE_NAME> | awk ' >= "[04/Jul/2017:07:00:00" &&  < "[04/Jul/2017:08:00:00"' | awk '{print }' | sort -n | uniq -c | sort -nr | head -20

for printing from january 23th, 9h18'12"upto now.

for printing from january 23th, 9h18'12"upto now.

In order to reduce perl code, I've used -sswitch to permit auto-assignement of variables from commandline: -s=09:18will populate a variable $swich will contain 09:18. Care to not miss the equal sign =and no spaces!

In order to reduce perl code, I've used -sswitch to permit auto-assignement of variables from commandline: -s=09:18will populate a variable $swich will contain 09:18. Care to not miss the equal sign =and no spaces!

Nota:This hold two diffent kind of regexfor two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile

Nota:This hold two diffent kind of regexfor two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile

##代码##

回答by Szántó Zoltán

If someone encounters with the awk: invalid -v option, here's a script to get the most active IPs in a predefined time range:

If someone encounters with the awk: invalid -v option, here's a script to get the most active IPs in a predefined time range:

##代码##