Linux 在 shell 脚本中使用正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1636352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:50:00  来源:igfitidea点击:

Using regular expressions in shell script

regexlinuxshell

提问by Amarghosh

What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curland sed(not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).

在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?我编写了以下脚本以使用curland在控制台上打印我的 SO 代表sed(不仅仅是因为我是代表疯狂的 - 我正在尝试在切换到 linux 之前学习一些 shell 脚本和正则表达式)。

json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*//' | sed s/,//

But somehow I feel that sedis not the proper tool to use here. I heard that grepis all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl).

但不知何故,我觉得这sed不是在这里使用的合适工具。我听说这完全grep是关于正则表达式并对其进行了一些探索。但显然,只要找到匹配项,它就会打印整行 - 我试图从单行文本中提取一个数字。这是我正在处理的字符串的缩小版本(由 返回curl)。

{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}

{"displayName":"Amarghosh","re​​putation":"2,737","badgeHtml":"\u003cspan title=\"1 银徽章\"\u003e\u003cspan class=\"badge2\"\u003e●\ u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}

I guess my questions are:

我想我的问题是:

  • What is the correct way to parse a string using regular expressions in a linux shell script?
  • Is sedthe right thing to use here?
  • Could this be done using grep?
  • Is there any other command that's more easier/appropriate?
  • 在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?
  • 就是sed在这里使用了正确的事情?
  • 这可以使用grep吗?
  • 有没有其他更容易/更合适的命令?

采纳答案by paxdiablo

The grepcommand will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sedin a pipeline:

grep命令将从许多行中选择所需的行,但不会直接操作该行。为此,您sed在管道中使用:

someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'

Alternatively, awk(or perlif available) can be used. It's a far more powerful text processing tool than sedin my opinion.

或者,可以使用awk(或perl如果可用)。它是一种比sed我认为的强大得多的文本处理工具。

someCommand | awk '/Amarghosh/ { do something }'

For simple text manipulations, just stick with the grep/sedcombo. When you need more complicated processing, move on up to awkor perl.

对于简单的文本操作,只需坚持使用grep/sed组合。当您需要更复杂的处理时,请移至awkperl

My first thought is to just use:

我的第一个想法是只使用:

echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
    | sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'

which keeps the number of sedprocesses to one (you can give multiple commands with -e).

这将sed进程数保持为 1(您可以使用 给出多个命令-e)。

回答by Brian Agnew

sedis appropriate, but you'll spawn a new process for every sedyou use (which may be too heavyweight in more complex scenarios). grepis not really appropriate. It's a search tool that uses regexps to find lines of interest.

sed是合适的,但是你会为sed你使用的每个进程产生一个新进程(在更复杂的场景中这可能太重了)。grep不太合适。这是一个使用正则表达式来查找感兴趣的行的搜索工具。

Perlis one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.

Perl是一种合适的解决方案,它是一种具有强大正则表达式功能的 shell 脚本语言。它可以完成您需要的大部分工作,而无需生成单独的进程(与普通的 Unix shell 脚本不同),并且具有庞大的附加函数库。

回答by pavium

sedis a perfectly valid command for your task, but it may not be the only one.

sed是对您的任务完全有效的命令,但它可能不是唯一的。

grepmay be useful too, but as you say it prints the whole line. It's most useful for filtering the lines of a multi-line file, and discarding the lines you don't want.

grep可能也很有用,但正如您所说,它会打印整行。它对于过滤多行文件的行和丢弃不需要的行最有用。

Efficient shell scripts can use a combination of commands (not just the two you mentioned), exploiting the talents of each.

高效的 shell 脚本可以使用命令的组合(不仅仅是你提到的两个),利用每个命令的才能。

回答by pavium

You may be interested in using Perl for such tasks. As a demonstration, here is a Perl script which prints the number you want:

您可能对使用 Perl 执行此类任务感兴趣。作为演示,这里有一个 Perl 脚本,用于打印您想要的数字:

#!/usr/local/bin/perl
use warnings;
use strict;
use LWP::Simple;
use JSON;

my $url = "http://stackoverflow.com/users/flair/165297.json";
my $flair = get ($url);
my $parsed = from_json ($flair);
print "$parsed->{reputation}\n";

This script requires you to install the JSON module, which you can do with just the command cpan JSON.

此脚本要求您安装 JSON 模块,您只需使用命令即可完成cpan JSON

回答by viam0Zah

For working with JSON in shell script, use jsawkwhich like awk, but for JSON.

对于shell脚本,使用JSON的jsawk其中如awk,但对于JSON

json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | jsawk 'return this.reputation' # 2,747

回答by qba

You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.

你可以用grep来做。grep 中的 -o 开关仅提取匹配的字符串而不是整行。

$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747

回答by ghostdog74

1) What is the correct way to parse a string using regular expressions in a linux shell script?

1) 在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?

Tools that include regular expression capabilities include sed, grep, awk, Perl, Python, to mention a few. Even newer version of Bash have regex capabilities. All you need to do is look up the docs on how to use them.

包含正则表达式功能的工具包括 sed、grep、awk、Perl、Python 等。即使是较新版本的 Bash 也具有正则表达式功能。您需要做的就是查找有关如何使用它们的文档。

2) Is sed the right thing to use here?

2)在这里使用 sed 是正确的吗?

It can be, but not necessary.

可以,但不是必须的。

3) Could this be done using grep?

3)这可以使用grep来完成吗?

Yes it can. you will just construct similar regex as you would if you use sed, or others. Note that grep just does what it does, and if you want to modify any files, it will not do it for you.

是的,它可以。如果您使用 sed 或其他,您将只构建类似的正则表达式。请注意,grep 只是做它所做的,如果你想修改任何文件,它不会为你做。

4) Is there any other command that's easier/more appropriate?

4)有没有其他更简单/更合适的命令?

Of course. regex can be powerful, but its not necessarily the best tool to use everytime. It also depends on what you mean by "easier/appropriate". The other method to use with minimal fuss on regex is using the fields/delimiter approach. you look for patterns that can be "splitted". for eg, in your case(i have downloaded the 165297.json file instead of using curl..(but its the same)

当然。regex 可以很强大,但它不一定是每次都使用的最佳工具。这也取决于你所说的“更容易/合适”是什么意思。另一种对正则表达式使用最少的方法是使用字段/分隔符方法。您寻找可以“拆分”的模式。例如,在你的情况下(我已经下载了 165297.json 文件而不是使用 curl ..(但它是一样的)

awk 'BEGIN{
 FS="reputation" # split on the word "reputation"
}
{
    m=split(,a,"\",\"")    # field 2 will contain the value you want plus the rest
                             # Then split on ":" and save to array "a"
    gsub(/[:\",]/,"",a[1])   # now, get rid of the redundant characters
    print a[1]
}' 165297.json

output:

输出:

$ ./shell.sh
2747

回答by mouviciel

My proposition:

我的提议:

$ echo $json | sed 's/,//g;s/^.*reputation...\([0-9]*\).*$//'

I put two commands in sed argument:

我在 sed 参数中放置了两个命令:

  • s/,//gis used to remove all commas, in particular the ones that are present in the reputation value.

  • s/^.*reputation...\([0-9]*\).*$/\1/locates the reputation value in the line and replaces the whole line by that value.

  • s/,//g用于删除所有逗号,尤其是声誉值中存在的逗号。

  • s/^.*reputation...\([0-9]*\).*$/\1/定位行中的声誉值并用该值替换整行。

In this particular case, I find that sedprovides the most compact command without loss of readability.

在这种特殊情况下,我发现它sed提供了最紧凑的命令而不会损失可读性。

Other tools for manipulating strings (not only regex) include:

其他用于操作字符串的工具(不仅是正则表达式)包括:

  • grep, awk, perlmentioned in most of other answers
  • trfor replacing characters
  • cut, pastefor handling multicolumn inputs
  • bashitself with its rich $(...)syntax for accessing variables
  • tail, headfor keeping last or first lines of a file
  • grep, awk,perl在大多数其他答案中提到
  • tr用于替换字符
  • cut,paste用于处理多列输入
  • bash本身具有$(...)用于访问变量的丰富语法
  • tail,head用于保留文件的最后一行或第一行

回答by Paused until further notice.

Blindly:

盲目地:

echo $json | awk -F\" '{print }'

Similar (the field separator can be a regex):

类似(字段分隔符可以是正则表达式):

awk -F'{"|":"|","|"}' '{print }'

Smarter (look for the key and print its value):

更智能(查找密钥并打印其值):

awk -F'{"|":"|","|"}' '{for(i=2; i<=NF; i+=2) if ($i == "reputation") print $(i+1)}'

回答by Sinan ünür

You can use a proper library (as others noted):

您可以使用适当的库(如其他人所述):

E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"

E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"

or

或者

$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'

$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'

depending on OS/shell combination.

取决于操作系统/外壳组合。