bash 如何在bash脚本中使用awk过滤2个日期之间的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28275880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:18:23  来源:igfitidea点击:

How to filter data between 2 dates with awk in a bash script

linuxbashdateawkgnu

提问by Gheorghe Frunza

Hi I have the following log file structure:

嗨,我有以下日志文​​件结构:

####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>

How can I filter this file by a date interval, for example: Show all data between 19'th and 20'th of January 2015

如何按日期间隔过滤此文件,例如:显示 2015 年 1 月 19 日和 20 日之间的所有数据

I tried to use awkbut I have problems converting 19-Jan-2015to 2015-01-19to continue comparison of dates.

我试图用awk,但我有问题转换19-Jan-20152015-01-19继续比较日期。

回答by Wintermute

For an oddball date format like that, I'd outsource the date parsing to the dateutility.

对于这种奇怪的日期格式,我会将日期解析外包给date实用程序。

#!/usr/bin/awk -f

# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
  # make d compatible with singly-quoted shell strings
  gsub(/'/, "'\''", d)

  # then run the date command and get its output
  command = "date -d '" d "' +%Y%m%d%H%M%S"
  command | getline result
  close(command)

  # that's our result.
  return result;
}

BEGIN {
  # Field separator, so the part of the timestamp we'll parse is in  and 
  FS = "[< >]+"

  # start, end set here.
  start = datefmt("19-Jan-2015 00:00:00")
  end   = datefmt("20-Jan-2015 23:59:59")
}

{
  # convert the timestamp into an easily comparable format
  stamp = datefmt( " " )

  # then print only lines in which the time stamp is in the range.
  if(stamp >= start && stamp <= end) {
    print
  }
}

回答by LogicIO

If the name of the file is example.txt, the the below script should work

如果文件名是example.txt,下面的脚本应该可以工作

 for i in `awk -F'<' {'print '} example.txt| awk {'print "_"'}`; do date=`echo $i | sed 's/_/ /g'`;  dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then  grep "$date" example.txt;fi;  done

The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.

该脚本只是将提供的时间转换为 unix 时间戳,然后比较时间并打印文件中满足条件的行。

回答by glenn Hymanman

Using string comparisons jwill be faster than creating date objects:

使用字符串比较 jwill 比创建日期对象更快:

awk -F '<' '
    {split(, d, /[- ]/)} 
    d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file

回答by glenn Hymanman

Another way using mktime all in awk

在 awk 中使用 mktime 的另一种方法

awk '

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match(
####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
,/<([^ ]+) ([^ ]+)/,a){ split(a[1],b,"-") split(a[2],c,":") b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3 Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3]) } Time<To&&Time>From ' file


Output

输出

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}


How it works

这个怎么运作

{time=0}

Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktimeto work.
The format is YYYY MM DD HH MM SS.

在处理这些行之前,设置日期 To 和 From ,我们想要的数据将在两者之间。
这种格式是mktime工作所必需的。
格式为YYYY MM DD HH MM SS.

match(
    split(a[1],b,"-")
    split(a[2],c,":")
,/<([^ ]+) ([^ ]+)/,a)

Reset time so further lines that don't match are not printed

重置时间,以便不打印更多不匹配的行

b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3

Matches the first two words after the <and stores them in a. Executes the next block if this is successful.

匹配 the 之后的前两个单词<并将它们存储在 a 中。如果成功,则执行下一个块。

 Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])

Splits the date and time into individual numbers/Month.

将日期和时间拆分为单独的数字/月。

Time<To&&Time>From

Converts month to number using the fact that all of them are three characters and then dividing by 3.

使用所有月份都是三个字符然后除以 3 的事实将月份转换为数字。

##代码##

makes time with collected values

为收集的值腾出时间

##代码##

if the time is more than Fromand less than Toit is inside the desired range and the default action for awk is to print.

如果时间大于From或小于To它在所需范围内并且 awk 的默认操作是打印。



Resources

资源

https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html