Linux 使用 ?与 sed

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4348166/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 00:12:37  来源:igfitidea点击:

Using ? with sed

linuxbashsed

提问by User1

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:

我只想获取可能会或可能不会被 gzip 压缩的文件的编号。但是,sed 中的正则表达式似乎不支持?. 这是我尝试过的:

echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

and nothing was returned. Then I added a ?to the string being analyzed:

并没有返回任何东西。然后我?在被分析的字符串中添加了一个:

echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'

and got:

并得到:

1

1

So, it looks like the ?used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1for file_1and file_1.gz. What's the best way to do that in a bash script if execution time is critical?

所以,看起来?大多数正则表达式中使用的 sed 不支持,对吗?那么,我只想 sed 给出一个1forfile_1file_1.gz。如果执行时间很关键,那么在 bash 脚本中执行此操作的最佳方法是什么?

采纳答案by Laurence Gonsalves

The equivalent to x?is \(x\|\).

相当于x?\(x\|\)

However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:

但是,许多版本的 sed 支持启用“扩展正则表达式”的选项,其中包括?. 在 GNU sed 中,标志是-r. 请注意,这也会更改未转义的括号以进行分组。例如:

echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?//p'

Actually, there's another bug in your regex which is that the greedy .*in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to *as far as I know, but you can use |to work around this. |in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:

实际上,您的正则表达式中还有另一个错误,即.*括号中的贪婪会吞噬“.gz”(如果有)。*据我所知,sed 没有非贪婪的等价物,但您可以使用它|来解决这个问题。|在 sed (以及许多其他正则表达式实现)中将使用最左边的匹配项,因此您可以执行以下操作:

echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))//'

This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.

这会尝试与 .gz 匹配,并且只有在不起作用时才尝试不使用它。实际上只有第 2 组或第 4 组中的一个会存在(因为它们位于同一组的两侧|),因此我们只需将它们连接起来即可获得我们想要的值。

回答by Andrew Sledge

echo 'file_1.gz'|sed -n 's/.*_\(.*\)\?\(\.gz\)//p'

Works. You have to put the return in the right spot, and you have to escape it.

作品。你必须把回报放在正确的位置,你必须逃避它。

回答by SiegeX

You should use awkwhich is superior to sedwhen it comes to field grabbing/parsing:

在字段抓取/解析方面,您应该使用awkwhich 优于sed

$ awk -F'[._]' '{print }' <<<"file_1"
1
$ awk -F'[._]' '{print }' <<<"file_1.gz"
1

Alternatively you can just use Bash's parameter expansion like so:

或者,您可以像这样使用 Bash 的参数扩展:

 var=file_1.gz; 
 temp=${var#*_}; 
 file=${temp%.*}
 echo $file

Note: works when var=file_1as well

:工作时var=file_1以及

回答by Wesley Rice

A function that should return a number that follows the '_' in a filename, regardless of file extension:

无论文件扩展名如何,都应返回文件名中“_”后面的数字的函数:

realname () {
  local n=${##*/}
  local rn="${n%.*}"
  sed 's/^.*\_//g' ${$rn:-$n}
}

回答by Paused until further notice.

Part of the solution lies in escaping the question mark or using the -roption.

部分解决方案在于逃避问号或使用-r选项。

sed 's/.*_\([^.]*\)\(\.\?[^.]\+\)\?$//'

or

或者

sed -r 's/.*_([^.]*)(\.?[^.]+)?$//'

will work for:

将适用于:

file_1.gz
file_12.txt
file_123

resulting in:

导致:

1
12
123

回答by User1

I just realized that could do something very easy:

我刚刚意识到可以做一些非常简单的事情:

echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'

echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'

Notice the [0-9]*instead of a .*. @Laurence Gonsalves's answer made me realize the greediness of my previous post.

请注意[0-9]*代替 a .*。@Laurence Gonsalves 的回答让我意识到我上一篇文章的贪婪。

回答by amichair

If you're looking for an answer to the specific example given in the question, or why it uses the ?incorrectly (regardless of syntax), see the answer by Laurence Gonsalves.

如果您正在寻找问题中给出的特定示例的答案,或者为什么它使用?错误(无论语法),请参阅Laurence Gonsalves 的答案

If you're looking instead for the answer to the general question of why ?doesn't exhibit its special meaning in sed as you might expect:

如果您正在寻找一般问题的答案,即为什么?不像您所期望的那样在 sed 中表现出其特殊含义:

By default, sed uses the " POSIX basic regular expressions syntax", so the question mark must be escaped as \?to apply its special meaning, otherwise it matches a literal question mark. As an alternative, you can use the -ror --regexp-extendedoption to use the "extended regular expression syntax", which reverses the meaning of escaped and non-escaped special characters, including ?.

默认情况下,sed 使用“POSIX 基本正则表达式语法”,因此必须对问号进行转义\?以应用其特殊含义,否则它会匹配字面问号。作为替代方法,您可以使用-ror--regexp-extended选项来使用“扩展正则表达式语法”,这会颠倒转义和非转义特殊字符(包括?.

In the words of the GNU sed documentation (view by running 'info sed' on Linux):

用 GNU sed 文档的话来说(通过在 Linux 上运行“info sed”来查看):

The only difference between basic and extended regular expressions is in the behavior of a few characters: '?', '+', parentheses, and braces ('{}'). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.

基本正则表达式和扩展正则表达式的唯一区别在于几个字符的行为:“?”、“+”、圆括号和大括号 (“{}”)。虽然基本正则表达式要求将它们作为特殊字符进行转义,但在使用扩展正则表达式时,如果您希望它们匹配文字字符,则必须对它们进行转义。

and the option is explained:

并解释了该选项:

-r--regexp-extended

-r--regexp-extended

Use extended regular expressions rather than basic regular expressions. Extended regexps are those that `egrep' accepts; they can be clearer because they usually have less backslashes, but are a GNU extension and hence scripts that use them are not portable.

使用扩展的正则表达式而不是基本的正则表达式。扩展的正则表达式是那些 `egrep' 接受的;它们可以更清晰,因为它们通常具有较少的反斜杠,但它们是 GNU 扩展,因此使用它们的脚本不可移植。

Update

更新

Newer versions of GNU sed now say this:

较新版本的 GNU sed 现在这样说:

-E-r--regexp-extended

-E-r--regexp-extended

Use extended regular expressions rather than basic regular expressions. Extended regexps are those that 'egrep' accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the '-E' extension has since been added to the POSIX standard (http://austingroupbugs.net/view.php?id=528), so use '-E' for portability. GNU sed has accepted '-E' as an undocumented option for years, and *BSD seds have accepted '-E' for years as well, but scripts that use '-E' might not port to other older systems.

使用扩展的正则表达式而不是基本的正则表达式。扩展正则表达式是 'egrep' 接受的那些;它们可以更清晰,因为它们通常具有较少的反斜杠。从历史上看,这是一个 GNU 扩展,但 '-E' 扩展已被添加到 POSIX 标准 ( http://austingroupbugs.net/view.php?id=528),因此使用 '-E' 以实现可移植性。GNU sed 多年来一直接受“-E”作为未公开的选项,*BSD seds 也接受“-E”多年,但使用“-E”的脚本可能无法移植到其他旧系统。

So, if you need to preserve compatibility with ancient GNU sed, stick with -r. But if you prefer better cross-platform portability on more modern systems (e.g. Linux+Mac support), go with -E(but note that there are still some quirks and differences between GNU sed and BSD sed, so you'll have to make sure your scripts are portable in any case).

因此,如果您需要保持与古老的 GNU sed 的兼容性,请坚持使用-r. 但是,如果您更喜欢更现代的系统(例如 Linux+Mac 支持)上更好的跨平台可移植性,请使用-E(但请注意,GNU sed 和 BSD sed 之间仍然存在一些怪癖和差异,因此您必须确保您的脚本在任何情况下都是可移植的)。