bash 如何在bash脚本中使用awk过滤2个日期之间的数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28275880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to filter data between 2 dates with awk in a bash script
提问by Gheorghe Frunza
Hi I have the following log file structure:
嗨,我有以下日志文件结构:
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>
How can I filter this file by a date interval, for example: Show all data between 19'th and 20'th of January 2015
如何按日期间隔过滤此文件,例如:显示 2015 年 1 月 19 日和 20 日之间的所有数据
I tried to use awk
but I have problems converting 19-Jan-2015
to 2015-01-19
to continue comparison of dates.
我试图用awk
,但我有问题转换19-Jan-2015
到2015-01-19
继续比较日期。
回答by Wintermute
For an oddball date format like that, I'd outsource the date parsing to the date
utility.
对于这种奇怪的日期格式,我会将日期解析外包给date
实用程序。
#!/usr/bin/awk -f
# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
# make d compatible with singly-quoted shell strings
gsub(/'/, "'\''", d)
# then run the date command and get its output
command = "date -d '" d "' +%Y%m%d%H%M%S"
command | getline result
close(command)
# that's our result.
return result;
}
BEGIN {
# Field separator, so the part of the timestamp we'll parse is in and
FS = "[< >]+"
# start, end set here.
start = datefmt("19-Jan-2015 00:00:00")
end = datefmt("20-Jan-2015 23:59:59")
}
{
# convert the timestamp into an easily comparable format
stamp = datefmt( " " )
# then print only lines in which the time stamp is in the range.
if(stamp >= start && stamp <= end) {
print
}
}
回答by LogicIO
If the name of the file is example.txt, the the below script should work
如果文件名是example.txt,下面的脚本应该可以工作
for i in `awk -F'<' {'print '} example.txt| awk {'print "_"'}`; do date=`echo $i | sed 's/_/ /g'`; dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then grep "$date" example.txt;fi; done
The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.
该脚本只是将提供的时间转换为 unix 时间戳,然后比较时间并打印文件中满足条件的行。
回答by glenn Hymanman
Using string comparisons jwill be faster than creating date objects:
使用字符串比较 jwill 比创建日期对象更快:
awk -F '<' '
{split(, d, /[- ]/)}
d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file
回答by glenn Hymanman
Another way using mktime all in awk
在 awk 中使用 mktime 的另一种方法
awk '
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match(####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
,/<([^ ]+) ([^ ]+)/,a){
split(a[1],b,"-")
split(a[2],c,":")
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From
' file
Output
输出
BEGIN{
From=mktime("2015 01 19 00 00 00")
To=mktime("2015 01 20 00 00 00")
}
How it works
这个怎么运作
{time=0}
Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime
to work.
The format is YYYY MM DD HH MM SS
.
在处理这些行之前,设置日期 To 和 From ,我们想要的数据将在两者之间。
这种格式是mktime
工作所必需的。
格式为YYYY MM DD HH MM SS
.
match( split(a[1],b,"-")
split(a[2],c,":")
,/<([^ ]+) ([^ ]+)/,a)
Reset time so further lines that don't match are not printed
重置时间,以便不打印更多不匹配的行
b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
Matches the first two words after the <
and stores them in a.
Executes the next block if this is successful.
匹配 the 之后的前两个单词<
并将它们存储在 a 中。如果成功,则执行下一个块。
Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
Splits the date and time into individual numbers/Month.
将日期和时间拆分为单独的数字/月。
Time<To&&Time>From
Converts month to number using the fact that all of them are three characters and then dividing by 3.
使用所有月份都是三个字符然后除以 3 的事实将月份转换为数字。
##代码##makes time with collected values
为收集的值腾出时间
##代码##if the time is more than From
and less than To
it is inside the desired range and the default action for awk is to print.
如果时间大于From
或小于To
它在所需范围内并且 awk 的默认操作是打印。
Resources
资源
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html
https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html