bash 使用 awk 提取子串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16040567/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 23:36:22  来源:igfitidea点击:

Use Awk to extract substring

bashawk

提问by Richard

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0in this case. I use following awk script to do so,

给定格式为 的主机名aaa0.bbb.ccc,我想提取之前的第一个子字符串.,即aaa0在本例中。我使用以下 awk 脚本来执行此操作,

echo aaa0.bbb.ccc | awk '{if (match(
$ echo aaa0.bbb.ccc | awk -F'.' '{print }'
aaa0
, /\./)) {print substr(
$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0
, 0, RSTART - 1)}}'

While the script running on one machine Aproduces aaa0, running on machine Bproduces only aaa, without 0in the end. Both machine runs Ubuntu/Linaro, but Aruns newer version of awk(gawk with version 3.1.8 while Bwith older awk (mawk with version 1.2)

在一台机器上运行的脚本A产生aaa0,在机器上运行B只产生aaa0最终没有。两台机器都运行Ubuntu/Linaro,但A运行较新版本的 awk(gawk 版本为 3.1.8,而B旧版 awk(mawk 版本为 1.2)

I am asking in general, how to write a compatible awk script that performs the same functionality ...

我一般在问,如何编写一个兼容的 awk 脚本来执行相同的功能......

回答by Chris Seymour

You just want to set the field separator as .using the -Foption and print the first field:

您只想将字段分隔符设置为.使用该-F选项并打印第一个字段:

$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0

Same thing but using cut:

同样的事情,但使用切割:

$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0

Or with sed:

或与sed

echo aaa0.bbb.ccc | cut -d'.' -f1

Even grep:

甚至grep

echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc

echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc 

x=aaa0.bbb.ccc; echo ${x/.*/}

回答by Kent

I am asking in general, how to write a compatible awk script that performs the same functionality ...

我一般在问,如何编写一个兼容的 awk 脚本来执行相同的功能......

To solve the problem in your quesiton is easy. (check others' answer).

解决问题中的问题很容易。(检查其他人的答案)。

If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)

如果你想编写一个可移植到任何 awk 实现和版本(gawk/nawk/mawk...)的 awk 脚本,这真的很难,即使使用 --posix (gawk)

for example:

例如:

  • some awk works on string in terms of characters, some with bytes
  • some supports \xescape, some not
  • FSinterpreter works differently
  • keywords/reserved words abbreviation restriction
  • some operator restriction e.g. **
  • even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
  • the implementation of certain functions are also different. (your problem is one example, see below)
  • 一些 awk 以字符形式处理字符串,一些使用字节
  • 有些支持\x逃逸,有些不支持
  • FS口译员的工作方式不同
  • 关键字/保留字缩写限制
  • 一些运营商限制,例如**
  • 甚至相同的 awk impl。(例如gawk),4.0和3.x版本也有区别。
  • 某些功能的实现也不同。(你的问题是一个例子,见下文)

well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}'the line like that will work all awks.

好吧,以上所有要点只是笼统地说。回到你的问题,你的问题只与 awk 的基本特性有关。awk '{print $x}'这样的行将适用于所有 awks。

There are two reasons why your awk line behaves differently on gawk and mawk:

awk 行在 gawk 和 mawk 上表现不同的原因有两个:

  • your used substr()function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1)the 0should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.

  • gawk and mawk implemented substr()differently.

  • substr()错误地使用了功能。这是主要原因。你substr($0, 0, RSTART - 1)0应该是1,无论哪个awk的,你用。awk 数组、字符串 idx 等都是基于 1 的。

  • gawk 和 mawk 的实现方式substr()不同。

回答by perreal

Or just use cut:

或者只是使用 cut:

sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc 
awk:
echo aaa0.bbb.ccc | awk -F. '{print }'
awk -F. '{print }' <<< aaa0.bbb.ccc 

回答by anishsane

You don't need awk for this...

你不需要 awk 这...

hostname=aaa0.bbb.ccc
echo ${hostname%%.*}

Heavier options:

更重的选择:

##代码##

回答by choroba

You do not need any external command at all, just use Parameter Expansion in bash:

您根本不需要任何外部命令,只需在 bash 中使用参数扩展:

##代码##