在 bash 中对一组关于日期范围的文件进行 grep
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17557377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do a grep of a group of files about a range of dates in bash
提问by Code Geas Coder
I have some rotative files, rotatitve because i have 5 files, and it files save logs of the all day. And if the first file is full the logs are saved in the second, and if the second is full, the logs are saved in the third file, and if the last file is full the content of the first file is deleted and the logs are saved in the first file. One file is for example:
我有一些旋转文件,因为我有 5 个文件,所以它文件保存了全天的日志。如果第一个文件已满,则将日志保存在第二个文件中,如果第二个文件已满,则将日志保存在第三个文件中,如果最后一个文件已满,则删除第一个文件的内容并保存日志保存在第一个文件中。一个文件是例如:
$cat log1
2013-06-09 08:00 Error1 08x000001 user2
2013-06-09 08:00 Error1 08x000001 user3
2013-06-09 08:01 Error2 08x000002 user4
2013-06-09 08:02 Error3 08x000003 user5
.
.
.
2013-06-09 12:22 Error9 08x900009 user5
2013-06-09 12:22 Error8 08x011011 user1
The problem is that i need read the logs, and do a grep of a range of time.
问题是我需要阅读日志,并在一段时间内进行 grep。
For example i need the logs of the 2013-06-09 between 08:00 and 11:00.
例如,我需要 2013-06-09 08:00 到 11:00 之间的日志。
I.e. the lines with hour: 08:00, 08:01, 08:02, 08:03, ..., 11:00 and date 2013-06-09
即带有小时的行:08:00, 08:01, 08:02, 08:03, ..., 11:00 and date 2013-06-09
And with a grep i can look the date, but i do not know how can i extract the lines of a range of hours.
使用 grep 我可以查看日期,但我不知道如何提取小时范围内的行。
回答by gniourf_gniourf
For your specific problem, with round hours:
对于您的特定问题,圆形小时:
grep '^2013-06-09 \(08*\|09*\|10*\|11:00\)'
should do.
应该做。
回答by KeepCalmAndCarryOn
You need to use egrep. you can then pipe that back into grep to get the date, or even do it as one egrep
您需要使用 egrep。然后,您可以将其通过管道送回 grep 以获取日期,甚至可以将其作为一个 egrep
$ egrep "0[8-9]:" log
2013-06-09 08:00 Error1 user2
2013-06-09 08:00 Error1 user3
2013-06-09 08:01 Error2 user2
2013-06-09 08:02 Error3 user5
2013-06-09 09:03 Error3 user5
and
和
$ egrep "(0[8-9]|1[0-1]):" a
2013-06-09 08:00 Error1 user2
2013-06-09 08:00 Error1 user3
2013-06-09 08:01 Error2 user2
2013-06-09 08:02 Error3 user5
2013-06-09 09:03 Error3 user5
2013-06-09 10:02 Error3 user5
2013-06-09 10:02 Error3 user5
2013-06-09 11:02 Error3 user5
回答by David W.
Let's look at your log file:
让我们看看你的日志文件:
2013-06-09 08:00 Error1 user2
2013-06-09 08:00 Error1 user3
2013-06-09 08:01 Error2 user2
2013-06-09 08:02 Error3 user5
2013-06-09 09:03 Error3 user5
2013-06-09 10:02 Error3 user5
2013-06-09 10:02 Error3 user5
2013-06-09 11:02 Error3 user5
What if we remove the formatting from the time stamp?
如果我们从时间戳中删除格式会怎样?
201306090800 Error1 user2
201306090800 Error1 user3
201306090801 Error2 user2
201306090802 Error3 user5
201306090903 Error3 user5
201306091002 Error3 user5
201306091002 Error3 user5
201306091102 Error3 user5
Now, it will be a lot easier getting a range of dates and time! Let's see what we can work up.
现在,获取一系列日期和时间会容易得多!让我们看看我们能做些什么。
Let's try a test:
让我们尝试一个测试:
sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' $logfile
The sed is a stream editor, and I'm using the substitutecommand (that's the s
). The command is in the form of:
sed 是一个流编辑器,我使用的是替换命令(就是s
)。命令的形式如下:
sed 's/old/new/' $logfile
This takes each line of the $logfile
and replaces the first instance of old
with new
and prints the changed line.
这将获取 的每一行$logfile
并替换old
with的第一个实例new
并打印更改后的行。
The old
is not a string of letters, but a regular expression. Regular expressions allow me to describe what I'm looking for. It's a very powerful concept.
该old
不是一串字母,而是一个正则表达式。正则表达式允许我描述我正在寻找的内容。这是一个非常强大的概念。
The [[:digit:]]
represents any digit on my line and the {4}
means there must be four of them. That matches the date. The parentheses are capture groups. Basically, I'm capturing each part of the date as a separate entity.
The[[:digit:]]
代表我线上的任何数字,{4}
意味着必须有四个。这与日期相符。括号是捕获组。基本上,我将日期的每个部分都作为一个单独的实体来捕获。
Here is a more detailed explaination:
这里有更详细的解释:
([[:digit:]]{4}) - Matches the four digit year
- Matches the dash after the year
([[:digit:]]{2}) Matches the two digit month
- Matches the dash after the month
([[:digit:]]{2}) Matches the two digit day of month
Matches the space between the date and time
([[:digit:]]{2}) Matches the two digit hour
: Matches the colon separator between the hours and minutes
([[:digit:]]{2}) Matches the minutes
Remember the parentheses? I can substitute the various parts of the date and time string to replace the entire string
还记得括号吗?我可以替换日期和时间字符串的各个部分来替换整个字符串
Year
Month
Date of Month
Hour
Minute
Take a look at my sed command and see if you can see each of these parts.
看看我的 sed 命令,看看你是否可以看到这些部分中的每一个。
Can I use awk
. Now that I've reformatted my line to remove the formatting of the time, I can use awk to break down each of the three pieces of my line:
我可以使用awk
. 现在我已经重新格式化了我的行以删除时间的格式,我可以使用 awk 来分解我的行的三个部分中的每一个:
sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' $logfile \
| awk '{
if ( ( >= 201306090800 ) && ( <= 201306091100 ) ) {
print cat log* | sed -E 's/([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2}) ([[:digit:]]{2}):([[:digit:]]{2})//' | awk '{
if ( ( >= 201306090800 ) && ( <= 201306091100 ) ) {
print awk '" ">="2013-06-09 08:00" && " " <= "2013-06-09 11:00"' *.log
}
}'
}
}'
Okay, a bit rough. The date and time are hard coded in the awk program, and the output will print out the date with all of the formatting stripped out. But, it will work.
好吧,有点粗糙。日期和时间在 awk 程序中被硬编码,并且输出将打印出去除所有格式的日期。但是,它会起作用。
It'll take a bit more work to smooth it out. For example, maybe have the user input the date and time range, and to reformat the date and time back into recognizable shape. However, will do what you want.
需要更多的工作来平滑它。例如,可能让用户输入日期和时间范围,并将日期和时间重新格式化为可识别的形状。但是,会做你想做的。
If you need multiple log files, you can use cat
which in this case is not useless:
如果您需要多个日志文件,您可以使用cat
which 在这种情况下不是无用的:
The main idea is to message the data the way you want. This would be easier if you specified a more high level scripting language like Perl or Python. In fact, this is the very type of task that cause Larry Wall to invent Perl.
主要思想是以您想要的方式向数据发送消息。如果您指定更高级的脚本语言(如 Perl 或 Python),这会更容易。事实上,这正是导致 Larry Wall 发明 Perl 的任务类型。
回答by Kent
if your date format is yyyy-mm-dd HH:MM
, it is relative easy, if I understood you correctly.
如果您的日期格式是yyyy-mm-dd HH:MM
,则相对容易,如果我理解正确的话。
You could:
你可以:
##代码##the *.log
will match all your 5 log files. it could be different pattern, e.g. log.*
depends on your filenames.
该*.log
会匹配所有的5个日志文件。它可能是不同的模式,例如log.*
取决于您的文件名。