bash 如何grep以数字或空格开头的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28419152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:21:08  来源:igfitidea点击:

How to grep lines starting with a digit or white space

regexbashgrep

提问by punekr12

I need to count messages per hour in my log file. Every log file line is preceded by the time stamp. Hence I am using following 'for' and 'grep' command to do this -

我需要在我的日志文件中计算每小时的消息数。每个日志文件行前面都有时间戳。因此,我使用以下“for”和“grep”命令来执行此操作-

for i in `seq 0 23`
do egrep "$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
done

This will give me number of messages per hour for 0 to 23.

这将为我提供 0 到 23 的每小时消息数。

However this does not work with single digit hour such as 5:23:32because it is preceded by a white space. Then the grep would have to be -

但是,这不适用于一位数的小时,例如5:23:32因为它前面有一个空格。那么grep必须是-

egrep " $i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l

If not it will incorrectly match lines starting with say 15:23:32

如果不是,它将错误地匹配以 say 开头的行 15:23:32

So how can I tell grep that a digit can be preceded by a space or start of the line only.

那么我怎么能告诉 grep 一个数字前面只能有一个空格或行首。

采纳答案by repzero

Using egrep

使用 egrep

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" 'filename'; done

^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9]this will tell egrep to match from start of line. if the line starts with a whitespace at the start of line or just starts with your pattern grep will match it. Also this will tell grep to match not to match greedily.

^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9]这将告诉 egrep 从行首开始匹配。如果该行以行首的空格开头或仅以您的模式开头,grep 将匹配它。这也将告诉 grep 匹配不要贪婪地匹配。

for example

例如

using your command with a pattern to find 5:23:32, (where $i=5) we get

使用带有模式的命令来查找5:23:32,(其中 $i=5)我们得到

5:23:23
   15:23:23

using the command above, we get

使用上面的命令,我们得到

 5:23:23

grep comes with a -c option to count

grep 带有 -c 选项来计数

you can also use grep's -c option instead of piping to wc -l

您还可以使用 grep 的 -c 选项而不是管道 wc -l

example

例子

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <pattern>" 'filename'; done

回答by Adam Katz

I think I can get rid of your forloop. This will work if that time (rather than a date) begins each line:

我想我可以摆脱你的for循环。如果该时间(而不是日期)从每一行开始,这将起作用:

$ awk -F : '/some_pattern/ { print  }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then prints the first element (as delimited by a colon), which would be the hour. It is then sorted and repeats of unique elements are counted and displayed on standard output.

这将搜索您想要的模式(有点像grep),然后打印第一个元素(由冒号分隔),这将是小时。然后对它进行排序,并对唯一元素的重复进行计数并显示在标准输出上。

However, let's say your logs look like /var/log/syslog, which has lines that look like this:

但是,假设您的日志看起来像/var/log/syslog,其中的行如下所示:

Feb  9 01:23:45 mycomputer service[PID]: details...

In this case, you have to tell AWK where to look:

在这种情况下,您必须告诉 AWK 在哪里查看:

$ awk '/some_pattern/ { gsub(/:.*/,"",); print  }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then replaces everything after the first colon of the third element (the time) an prints what remains (the hour). The rest is as described above.

这将搜索您想要的模式(有点像grep),然后替换第三个元素的第一个冒号(时间)之后的所有内容,并打印剩余的内容(小时)。其余的如上所述。

A sample output (of either of the above variants):

示例输出(上述任一变体的):

 12 07
 34 08
 30 09
 51 10
536 11
346 12
123 13

This notes that there were twelve matches to my query at 7 am and that I didn't really start using this system until 11 am.

这说明早上 7 点有 12 个匹配我的查询,我直到早上 11 点才真正开始使用这个系统。

回答by glenn Hymanman

To match a timestamp where the hour from 0 to 9 is space-padded or zero-padded:

要匹配从 0 到 9 的小时是空格填充或零填充的时间戳:

With basic regular expressions

使用基本的正则表达式

grep '^\([ 01][0-9]\|2[0-3]\):[0-5][0-9]:[0-5][0-9]' file

or extended regular expressions

或扩展正则表达式

grep -E '^([ 01][0-9]|2[0-3])(:[0-5][0-9]){2}' file

ref: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

参考:https: //www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

回答by David Hoelzer

grep "^[ 0-9][0]9...

grep "^[ 0-9][0]9...

I think this is what you're looking for unless I've misunderstood your question. Add the whitespace to the first set as an option and anchor it to the beginning of the line.

我认为这就是你要找的,除非我误解了你的问题。将空格作为选项添加到第一组并将其锚定到行的开头。