Linux 在 shell 脚本中使用正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1636352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using regular expressions in shell script
提问by Amarghosh
What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl
and sed
(not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?我编写了以下脚本以使用curl
and在控制台上打印我的 SO 代表sed
(不仅仅是因为我是代表疯狂的 - 我正在尝试在切换到 linux 之前学习一些 shell 脚本和正则表达式)。
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*//' | sed s/,//
But somehow I feel that sed
is not the proper tool to use here. I heard that grep
is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl
).
但不知何故,我觉得这sed
不是在这里使用的合适工具。我听说这完全grep
是关于正则表达式并对其进行了一些探索。但显然,只要找到匹配项,它就会打印整行 - 我试图从单行文本中提取一个数字。这是我正在处理的字符串的缩小版本(由 返回curl
)。
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 银徽章\"\u003e\u003cspan class=\"badge2\"\u003e●\ u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
我想我的问题是:
- What is the correct way to parse a string using regular expressions in a linux shell script?
- Is
sed
the right thing to use here? - Could this be done using
grep
? - Is there any other command that's more easier/appropriate?
- 在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?
- 就是
sed
在这里使用了正确的事情? - 这可以使用
grep
吗? - 有没有其他更容易/更合适的命令?
采纳答案by paxdiablo
The grep
command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed
in a pipeline:
该grep
命令将从许多行中选择所需的行,但不会直接操作该行。为此,您sed
在管道中使用:
someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'
Alternatively, awk
(or perl
if available) can be used. It's a far more powerful text processing tool than sed
in my opinion.
或者,可以使用awk
(或perl
如果可用)。它是一种比sed
我认为的强大得多的文本处理工具。
someCommand | awk '/Amarghosh/ { do something }'
For simple text manipulations, just stick with the grep/sed
combo. When you need more complicated processing, move on up to awk
or perl
.
对于简单的文本操作,只需坚持使用grep/sed
组合。当您需要更复杂的处理时,请移至awk
或perl
。
My first thought is to just use:
我的第一个想法是只使用:
echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
| sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'
which keeps the number of sed
processes to one (you can give multiple commands with -e
).
这将sed
进程数保持为 1(您可以使用 给出多个命令-e
)。
回答by Brian Agnew
sed
is appropriate, but you'll spawn a new process for every sed
you use (which may be too heavyweight in more complex scenarios). grep
is not really appropriate. It's a search tool that uses regexps to find lines of interest.
sed
是合适的,但是你会为sed
你使用的每个进程产生一个新进程(在更复杂的场景中这可能太重了)。grep
不太合适。这是一个使用正则表达式来查找感兴趣的行的搜索工具。
Perlis one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.
Perl是一种合适的解决方案,它是一种具有强大正则表达式功能的 shell 脚本语言。它可以完成您需要的大部分工作,而无需生成单独的进程(与普通的 Unix shell 脚本不同),并且具有庞大的附加函数库。
回答by pavium
sed
is a perfectly valid command for your task, but it may not be the only one.
sed
是对您的任务完全有效的命令,但它可能不是唯一的。
grep
may be useful too, but as you say it prints the whole line. It's most useful for filtering the lines of a multi-line file, and discarding the lines you don't want.
grep
可能也很有用,但正如您所说,它会打印整行。它对于过滤多行文件的行和丢弃不需要的行最有用。
Efficient shell scripts can use a combination of commands (not just the two you mentioned), exploiting the talents of each.
高效的 shell 脚本可以使用命令的组合(不仅仅是你提到的两个),利用每个命令的才能。
回答by pavium
You may be interested in using Perl for such tasks. As a demonstration, here is a Perl script which prints the number you want:
您可能对使用 Perl 执行此类任务感兴趣。作为演示,这里有一个 Perl 脚本,用于打印您想要的数字:
#!/usr/local/bin/perl
use warnings;
use strict;
use LWP::Simple;
use JSON;
my $url = "http://stackoverflow.com/users/flair/165297.json";
my $flair = get ($url);
my $parsed = from_json ($flair);
print "$parsed->{reputation}\n";
This script requires you to install the JSON module, which you can do with just the command cpan JSON
.
此脚本要求您安装 JSON 模块,您只需使用命令即可完成cpan JSON
。
回答by viam0Zah
回答by qba
You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.
你可以用grep来做。grep 中的 -o 开关仅提取匹配的字符串而不是整行。
$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747
回答by ghostdog74
1) What is the correct way to parse a string using regular expressions in a linux shell script?
1) 在 linux shell 脚本中使用正则表达式解析字符串的正确方法是什么?
Tools that include regular expression capabilities include sed, grep, awk, Perl, Python, to mention a few. Even newer version of Bash have regex capabilities. All you need to do is look up the docs on how to use them.
包含正则表达式功能的工具包括 sed、grep、awk、Perl、Python 等。即使是较新版本的 Bash 也具有正则表达式功能。您需要做的就是查找有关如何使用它们的文档。
2) Is sed the right thing to use here?
2)在这里使用 sed 是正确的吗?
It can be, but not necessary.
可以,但不是必须的。
3) Could this be done using grep?
3)这可以使用grep来完成吗?
Yes it can. you will just construct similar regex as you would if you use sed, or others. Note that grep just does what it does, and if you want to modify any files, it will not do it for you.
是的,它可以。如果您使用 sed 或其他,您将只构建类似的正则表达式。请注意,grep 只是做它所做的,如果你想修改任何文件,它不会为你做。
4) Is there any other command that's easier/more appropriate?
4)有没有其他更简单/更合适的命令?
Of course. regex can be powerful, but its not necessarily the best tool to use everytime. It also depends on what you mean by "easier/appropriate". The other method to use with minimal fuss on regex is using the fields/delimiter approach. you look for patterns that can be "splitted". for eg, in your case(i have downloaded the 165297.json file instead of using curl..(but its the same)
当然。regex 可以很强大,但它不一定是每次都使用的最佳工具。这也取决于你所说的“更容易/合适”是什么意思。另一种对正则表达式使用最少的方法是使用字段/分隔符方法。您寻找可以“拆分”的模式。例如,在你的情况下(我已经下载了 165297.json 文件而不是使用 curl ..(但它是一样的)
awk 'BEGIN{
FS="reputation" # split on the word "reputation"
}
{
m=split(,a,"\",\"") # field 2 will contain the value you want plus the rest
# Then split on ":" and save to array "a"
gsub(/[:\",]/,"",a[1]) # now, get rid of the redundant characters
print a[1]
}' 165297.json
output:
输出:
$ ./shell.sh
2747
回答by mouviciel
My proposition:
我的提议:
$ echo $json | sed 's/,//g;s/^.*reputation...\([0-9]*\).*$//'
I put two commands in sed argument:
我在 sed 参数中放置了两个命令:
s/,//g
is used to remove all commas, in particular the ones that are present in the reputation value.s/^.*reputation...\([0-9]*\).*$/\1/
locates the reputation value in the line and replaces the whole line by that value.
s/,//g
用于删除所有逗号,尤其是声誉值中存在的逗号。s/^.*reputation...\([0-9]*\).*$/\1/
定位行中的声誉值并用该值替换整行。
In this particular case, I find that sed
provides the most compact command without loss of readability.
在这种特殊情况下,我发现它sed
提供了最紧凑的命令而不会损失可读性。
Other tools for manipulating strings (not only regex) include:
其他用于操作字符串的工具(不仅是正则表达式)包括:
grep
,awk
,perl
mentioned in most of other answerstr
for replacing characterscut
,paste
for handling multicolumn inputsbash
itself with its rich$(...)
syntax for accessing variablestail
,head
for keeping last or first lines of a file
grep
,awk
,perl
在大多数其他答案中提到tr
用于替换字符cut
,paste
用于处理多列输入bash
本身具有$(...)
用于访问变量的丰富语法tail
,head
用于保留文件的最后一行或第一行
回答by Paused until further notice.
Blindly:
盲目地:
echo $json | awk -F\" '{print }'
Similar (the field separator can be a regex):
类似(字段分隔符可以是正则表达式):
awk -F'{"|":"|","|"}' '{print }'
Smarter (look for the key and print its value):
更智能(查找密钥并打印其值):
awk -F'{"|":"|","|"}' '{for(i=2; i<=NF; i+=2) if ($i == "reputation") print $(i+1)}'
回答by Sinan ünür
You can use a proper library (as others noted):
您可以使用适当的库(如其他人所述):
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
or
或者
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
depending on OS/shell combination.
取决于操作系统/外壳组合。