Linux 使用 grep 获取每行匹配后的下一个 WORD

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10971765/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:46:00  来源:igfitidea点击:

Using grep to get the next WORD after a match in each line

linuxgrep

提问by aditya.gupta

I want to get the "GET" queries from my server logs.

我想从我的服务器日志中获取“ GET”查询。

For example, this is the server log

例如,这是服务器日志

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -   
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

When I try with simple grep or awk,

当我尝试使用简单的 grep 或 awk 时,

Adi:~ adi$ awk '/GET/, /HTTP/' serverlogs.txt

it gives out

它发出

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

I just want to display : helloand ss

我只想显示:你好ss

Is there any way this could be done?

有没有办法做到这一点?

采纳答案by Tim Pote

Assuming you have gnu grep, you can use perl-style regex to do a positive lookbehind:

假设你有 gnu grep,你可以使用 perl-style regex 做一个积极的回顾:

grep -oP '(?<=GET\s/)\w+' file

If you don't have gnu grep, then I'd advise just using sed:

如果您没有 gnu grep,那么我建议您只使用 sed:

sed -n '/^.*GET[[:space:]]\{1,\}\/\([-_[:alnum:]]\{1,\}\).*$/s///p' file

If you happen to have gnu sed, that can be greatly simplified:

如果你碰巧有 gnu sed,那可以大大简化:

sed -n '/^.*GET\s\+\/\(\w\+\).*$/s///p' file

The bottom line here is, you certainly don't need pipes to accomplish this. grepor sedalone will suffice.

这里的底线是,您当然不需要管道来完成此操作。 grep或者sed一个人就足够了。

回答by John Carter

In this case since the log file has a known structure, one option is to use cutto pull out the 7th column (fields are denoted by tabs by default).

在这种情况下,由于日志文件具有已知结构,因此一种选择是使用cut拉出第 7 列(字段默认由制表符表示)。

grep GET log.txt | cut -f 7 

回答by Todd A. Jacobs

It's often easier to use a pipeline rather than a single complex regular expression. This works on the data you provided:

使用管道通常比使用单个复杂的正则表达式更容易。这适用于您提供的数据:

fgrep GET /tmp/foo | 
    egrep -o 'GET (.*) HTTP' |
    sed -r 's/^GET \/(.+) HTTP//'

This pipeline returns the following results:

此管道返回以下结果:

hello
ss

There are certainly other ways to get the job done, but this patently works on the provided corpus.

当然还有其他方法可以完成工作,但这显然适用于提供的语料库。

回答by Charles Chow

use a pipe if you use grep:

如果您使用 grep,请使用管道:

grep -o /he.* log.txt | grep -o [^/].*
grep -o /ss log.txt | grep -o [^/].*

[^/] means extract the letters after ^ symbol from the grep output

[^/] 表示从 grep 输出中提取 ^ 符号后的字母

回答by P....

gawk '{match(,/\/(\w+)/,a);} length(a[1]){print a[1]}' log.txt
hello
ss

If you have gawkthen above command will use matchfunction to select the desired value using regex and storing it to an array a.

如果你有gawk那么上面的命令将使用match函数使用正则表达式选择所需的值并将其存储到数组中a

回答by ajp619

I was trying to do this and came across this link: https://www.unix.com/shell-programming-and-scripting/153101-print-next-word-after-found-pattern.html

我试图这样做并遇到了这个链接:https: //www.unix.com/shell-programming-and-scripting/153101-print-next-word-after-found-pattern.html

Summary: use grep to find matching lines, then use awk to find the pattern and print the next field:

总结:使用grep查找匹配行,然后使用awk查找模式并打印下一个字段:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}'

If you want to know the unique occurrences:

如果您想知道唯一的事件:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}' | \
  sort | \
  uniq -c