在 bash 中对一组关于日期范围的文件进行 grep

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17557377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 05:52:57  来源:igfitidea点击:

Do a grep of a group of files about a range of dates in bash

linuxbashshelldate

提问by Code Geas Coder

I have some rotative files, rotatitve because i have 5 files, and it files save logs of the all day. And if the first file is full the logs are saved in the second, and if the second is full, the logs are saved in the third file, and if the last file is full the content of the first file is deleted and the logs are saved in the first file. One file is for example:

我有一些旋转文件,因为我有 5 个文件,所以它文件保存了全天的日志。如果第一个文件已满,则将日志保存在第二个文件中,如果第二个文件已满,则将日志保存在第三个文件中,如果最后一个文件已满,则删除第一个文件的内容并保存日志保存在第一个文件中。一个文件是例如:

$cat log1
2013-06-09 08:00  Error1  08x000001  user2
2013-06-09 08:00  Error1  08x000001  user3
2013-06-09 08:01  Error2  08x000002 user4
2013-06-09 08:02  Error3  08x000003  user5     
              .
              . 
              .
2013-06-09 12:22  Error9  08x900009  user5
2013-06-09 12:22  Error8  08x011011  user1

The problem is that i need read the logs, and do a grep of a range of time.

问题是我需要阅读日志,并在一段时间内进行 grep。

For example i need the logs of the 2013-06-09 between 08:00 and 11:00.

例如,我需要 2013-06-09 08:00 到 11:00 之间的日志。

I.e. the lines with hour: 08:00, 08:01, 08:02, 08:03, ..., 11:00 and date 2013-06-09

即带有小时的行:08:00, 08:01, 08:02, 08:03, ..., 11:00 and date 2013-06-09

And with a grep i can look the date, but i do not know how can i extract the lines of a range of hours.

使用 grep 我可以查看日期,但我不知道如何提取小时范围内的行。

回答by gniourf_gniourf

For your specific problem, with round hours:

对于您的特定问题,圆形小时:

grep '^2013-06-09 \(08*\|09*\|10*\|11:00\)'

should do.

应该做。

回答by KeepCalmAndCarryOn

You need to use egrep. you can then pipe that back into grep to get the date, or even do it as one egrep

您需要使用 egrep。然后,您可以将其通过管道送回 grep 以获取日期,甚至可以将其作为一个 egrep

$ egrep "0[8-9]:" log
2013-06-09 08:00  Error1  user2
2013-06-09 08:00  Error1  user3
2013-06-09 08:01  Error2  user2
2013-06-09 08:02  Error3  user5
2013-06-09 09:03  Error3  user5

and

$ egrep "(0[8-9]|1[0-1]):" a
2013-06-09 08:00  Error1  user2
2013-06-09 08:00  Error1  user3
2013-06-09 08:01  Error2  user2
2013-06-09 08:02  Error3  user5
2013-06-09 09:03  Error3  user5
2013-06-09 10:02  Error3  user5
2013-06-09 10:02  Error3  user5
2013-06-09 11:02  Error3  user5

回答by David W.

Let's look at your log file:

让我们看看你的日志文件:

2013-06-09 08:00  Error1  user2
2013-06-09 08:00  Error1  user3
2013-06-09 08:01  Error2  user2
2013-06-09 08:02  Error3  user5
2013-06-09 09:03  Error3  user5
2013-06-09 10:02  Error3  user5
2013-06-09 10:02  Error3  user5
2013-06-09 11:02  Error3  user5

What if we remove the formatting from the time stamp?

如果我们从时间戳中删除格式会怎样?

201306090800  Error1  user2
201306090800  Error1  user3
201306090801  Error2  user2
201306090802  Error3  user5
201306090903  Error3  user5
201306091002  Error3  user5
201306091002  Error3  user5
201306091102  Error3  user5

Now, it will be a lot easier getting a range of dates and time! Let's see what we can work up.

现在,获取一系列日期和时间会容易得多!让我们看看我们能做些什么。

Let's try a test:

让我们尝试一个测试:

sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' $logfile

The sed is a stream editor, and I'm using the substitutecommand (that's the s). The command is in the form of:

sed 是一个流编辑器,我使用的是替换命令(就是s)。命令的形式如下:

 sed 's/old/new/' $logfile

This takes each line of the $logfileand replaces the first instance of oldwith newand prints the changed line.

这将获取 的每一行$logfile并替换oldwith的第一个实例new并打印更改后的行。

The oldis not a string of letters, but a regular expression. Regular expressions allow me to describe what I'm looking for. It's a very powerful concept.

old不是一串字母,而是一个正则表达式。正则表达式允许我描述我正在寻找的内容。这是一个非常强大的概念。

The [[:digit:]]represents any digit on my line and the {4}means there must be four of them. That matches the date. The parentheses are capture groups. Basically, I'm capturing each part of the date as a separate entity.

The[[:digit:]]代表我线上的任何数字,{4}意味着必须有四个。这与日期相符。括号是捕获组。基本上,我将日期的每个部分都作为一个单独的实体来捕获。

Here is a more detailed explaination:

这里有更详细的解释:

([[:digit:]]{4}) - Matches the four digit year
-                  Matches the dash after the year
([[:digit:]]{2})   Matches the two digit month
-                  Matches the dash after the month
([[:digit:]]{2})   Matches the two digit day of month
                   Matches the space between the date and time
([[:digit:]]{2})   Matches the two digit hour
:                  Matches the colon separator between the hours and minutes
([[:digit:]]{2})   Matches the minutes

Remember the parentheses? I can substitute the various parts of the date and time string to replace the entire string

还记得括号吗?我可以替换日期和时间字符串的各个部分来替换整个字符串

   Year
   Month
   Date of Month
   Hour
   Minute

Take a look at my sed command and see if you can see each of these parts.

看看我的 sed 命令,看看你是否可以看到这些部分中的每一个。

Can I use awk. Now that I've reformatted my line to remove the formatting of the time, I can use awk to break down each of the three pieces of my line:

我可以使用awk. 现在我已经重新格式化了我的行以删除时间的格式,我可以使用 awk 来分解我的行的三个部分中的每一个:

 sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' $logfile \
 | awk '{
     if ( (  >= 201306090800 ) && (  <= 201306091100 ) ) {
         print 
cat log* | sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' | awk '{

     if ( (  >= 201306090800 ) && (  <= 201306091100 ) ) {
         print 
awk '" ">="2013-06-09 08:00" && " " <= "2013-06-09 11:00"' *.log
} }'
} }'

Okay, a bit rough. The date and time are hard coded in the awk program, and the output will print out the date with all of the formatting stripped out. But, it will work.

好吧,有点粗糙。日期和时间在 awk 程序中被硬编码,并且输出将打印出去除所有格式的日期。但是,它会起作用。

It'll take a bit more work to smooth it out. For example, maybe have the user input the date and time range, and to reformat the date and time back into recognizable shape. However, will do what you want.

需要更多的工作来平滑它。例如,可能让用户输入日期和时间范围,并将日期和时间重新格式化为可识别的形状。但是,会做你想做的。

If you need multiple log files, you can use catwhich in this case is not useless:

如果您需要多个日志文件,您可以使用catwhich 在这种情况下不是无用的:

##代码##

The main idea is to message the data the way you want. This would be easier if you specified a more high level scripting language like Perl or Python. In fact, this is the very type of task that cause Larry Wall to invent Perl.

主要思想是以您想要的方式向数据发送消息。如果您指定更高级的脚本语言(如 Perl 或 Python),这会更容易。事实上,这正是导致 Larry Wall 发明 Perl 的任务类型。

回答by Kent

if your date format is yyyy-mm-dd HH:MM, it is relative easy, if I understood you correctly.

如果您的日期格式是yyyy-mm-dd HH:MM,则相对容易,如果我理解正确的话。

You could:

你可以:

##代码##

the *.logwill match all your 5 log files. it could be different pattern, e.g. log.*depends on your filenames.

*.log会匹配所有的5个日志文件。它可能是不同的模式,例如log.*取决于您的文件名。

回答by Hai Vu

What you need is a log viewer. There are many around, but one I used a while ago is multitail.

你需要的是一个日志查看器。周围有很多,但我不久前使用的一个是multitail