bash 使用空格作为剪切命令的分隔符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/816820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:08:54  来源:igfitidea点击:

Use space as a delimiter with cut command

bashunixcut

提问by Jaelebi

I want to use space as a delimiter with the cutcommand.

我想使用空格作为cut命令的分隔符。

What syntax can I use for this?

我可以为此使用什么语法?

回答by RichieHindle

cut -d ' ' -f 2

Where 2 is the field number of the space-delimited field you want.

其中 2 是您想要的以空格分隔的字段的字段编号。

回答by BeniBela

Usually if you use space as delimiter, you want to treat multiple spaces as one, because you parse the output of a command aligning some columns with spaces. (and the google search for that lead me here)

通常,如果您使用空格作为分隔符,您希望将多个空格视为一个,因为您解析命令的输出,将某些列与空格对齐。(谷歌搜索将我带到这里)

In this case a single cutcommand is not sufficient, and you need to use:

在这种情况下,单个cut命令是不够的,您需要使用:

tr -s ' ' | cut -d ' ' -f 2

Or

或者

awk '{print }'

回答by mklement0

To complement the existing, helpful answers; tip of the hat to QZ Supportfor encouraging me to post a separate answer:

补充现有的、有用的答案;向QZ Support 致敬,鼓励我发布单独的答案:

Two distinct mechanismscome into play here:

两种不同的机制在这里发挥作用:

  • (a) whether cutitselfrequires the delimiter (space, in this case) passed to the -doption to be a separate argumentor whether it's acceptable to append it directlyto -d.

  • (b) how the shellgenerally parses arguments before passing them to the command being invoked.

  • (a)cut本身是否需要将传递给-d选项的分隔符(在本例中为空格)作为单独的参数,或者是否可以将其直接附加到-d.

  • (b) 在将参数传递给被调用的命令之前,shell通常如何解析参数。

(a) is answered by a quote from the POSIX guidelines for utilities(emphasis mine)

(a) 引用了POSIX 公用事业指南(重点是我的)

If the SYNOPSIS of a standard utility shows an option with a mandatoryoption-argument [...] a conforming application shall use separatearguments for that option and its option-argument. However, a conforming implementation shall alsopermit applications to specify the option and option-argument in the same argument string without intervening characters.

如果标准实用程序的 SYNOPSIS 显示带有强制选项参数 [...] 的选项,则符合要求的应用程序应为该选项及其选项参数使用单独的参数然而,一个符合标准的实现应允许应用程序指定同一参数串的选项,选项参数中间没有字符

In other words: In this case, because -d's option-argument is mandatory, you can choosewhether to specify the delimiter as:

换句话说:在这种情况下,因为-d的选项参数是强制性的您可以选择是否将分隔符指定为

  • (s) EITHER: a separateargument
  • (d) OR: as a value directly attachedto -d.
  • (s) EITHER:一个单独的论点
  • (d) OR:作为直接附加-d的值。

Once you've chosen (s) or (d), it is the shell's string-literal parsing - (b) - that matters:

一旦你选择了 (s) 或 (d),shell的字符串文字解析 - (b) - 很重要:

  • With approach (s), all of the following forms are EQUIVALENT:

    • -d ' '
    • -d " "
    • -d \<space> # <space> used to represent an actual space for technical reasons
  • With approach (d), all of the following forms are EQUIVALENT:

    • -d' '
    • -d" "
    • "-d "
    • '-d '
    • d\<space>
  • 使用方法(s),以下所有形式都是等效的:

    • -d ' '
    • -d " "
    • -d \<space> # <space> used to represent an actual space for technical reasons
  • 使用方法(d),以下所有形式都是等效的:

    • -d' '
    • -d" "
    • "-d "
    • '-d '
    • d\<space>

The equivalence is explained by the shell's string-literal processing:

shell的字符串文字处理解释了等效性:

All solutions above result in the exact same string(in each group) by the time cutsees them:

看到它们时,上述cut所有解决方案都会产生完全相同的字符串(在每组中)

  • (s): cutsees -d, as its ownargument, followed by a separateargument that contains a space char - without quotes or \prefix!.

  • (d): cutsees -dplusa space char - without quotes or \prefix! - as part of the sameargument.

  • (s): cutsees -d,作为它自己的参数,后跟一个包含空格字符的单独参数 - 没有引号或\前缀!。

  • (d): cutsees-d加上一个空格字符 - 没有引号或\前缀!- 作为同一论点的一部分。

The reason the forms in the respective groups are ultimately identical is twofold, based on how the shellparses string literals:

其原因在各组的形式是最终相同是双重的,基于所述如何解析字符串文字

  • The shell allows literal to be specified as isthrough a mechanism called quoting, which can take several forms:
    • single-quotedstrings: the contents inside '...'is taken literallyand forms a singleargument
    • double-quotedstrings: the contents inside "..."also forms a singleargument, but is subject to interpolation(expands variable references such as $var, command substitutions ($(...)or `...`), or arithmetic expansions ($(( ... ))).
    • \-quoting of individualcharacters: a \preceding a single character causes that character to be interpreted as a literal.
  • Quoting is complemented by quote removal, which means that once the shell has parsed a command line, it removesthe quote characters from the arguments(enclosing '...'or "..."or \instances) - thus, the command being invoked never sees the quote characters.
  • shell 允许通过称为引用的机制按原样指定文字,引用机制可以采用多种形式
    • 单引号字符串:里面的内容'...'字面,并形成一个单一的参数
    • 双引号字符串:里面的内容"..."也形成一个单一的参数,但受制于插值(扩展变量引用,如$var、命令替换($(...)`...`)或算术扩展($(( ... )))。
    • \-quoting of individualcharacters\单个字符前面的字符会导致该字符被解释为文字。
  • 引用被补充引用的去除,这意味着一旦壳已解析的命令线,就删除从参数的引用字符(封闭'...'"..."\实例) -从而,该命令被调用不会看到引号字符

回答by Chas. Owens

You can also say:

你也可以说:

cut -d\  -f 2

Note that there are two spaces after the backslash.

请注意,反斜杠后面有两个空格。

回答by fedorqui 'SO stop harming'

I just discoveredthat you can also use "-d ":

刚刚发现你也可以使用"-d "

cut "-d "

Test

测试

$ cat a
hello how are you
I am fine
$ cut "-d " -f2 a
how
am

回答by Anssi

You can't do it easily with cut if the data has for example multiple spaces. I have found it useful to normalize input for easier processing. One trick is to use sed for normalization as below.

如果数据有多个空格,你就不能用 cut 轻松地做到这一点。我发现标准化输入以便于处理很有用。一个技巧是使用 sed 进行标准化,如下所示。

echo -e "foor\t \t bar" | sed 's:\s\+:\t:g' | cut -f2  #bar

回答by Harry Mangalam

scut, a cut-like utility (smarter but slower I made) that can use any perl regex as a breaking token. Breaking on whitespace is the default, but you can also break on multi-char regexes, alternative regexes, etc.

scut,一个类似 cut 的实用程序(我制作的更智能但更慢),可以使用任何 perl 正则表达式作为破坏标记。中断空格是默认设置,但您也可以中断多字符正则表达式、替代正则表达式等。

scut -f='6 2 8 7' < input.file  > output.file

so the above command would break columns on whitespace and extract the (0-based) cols 6 2 8 7 in that order.

所以上面的命令会在空格上打断列并按这个顺序提取(基于 0 的)cols 6 2 8 7。

回答by Stephen Quan

I have an answer (I admit somewhat confusing answer) that involvessed, regular expressions and capture groups:

我有一个涉及sed正则表达式和捕获组的答案(我承认有些令人困惑的答案):

  • \S*- first word
  • \s*- delimiter
  • (\S*)- second word - captured
  • .*- rest of the line
  • \S*- 第一个字
  • \s*- 分隔符
  • (\S*)- 第二个字 - 捕获
  • .*- 其余部分

As a sedexpression, the capture group needs to be escaped, i.e. \(and \).

作为sed表达式,捕获组需要转义,即\(\)

The \1returns a copy of the captured group, i.e. the second word.

\1返回的拍摄组的副本,即,在第二个字。

$ echo "alpha beta gamma delta" | sed 's/\S*\s*\(\S*\).*//'
beta

When you look at this answer, its somewhat confusing, and, you may think, why bother? Well, I'm hoping that some, may go "Aha!" and will use this pattern to solve some complex text extraction problems with a single sedexpression.

当你看到这个答案时,它有点令人困惑,你可能会想,为什么要打扰?好吧,我希望有些人可能会“啊哈!” 并将使用这种模式来解决一些复杂的文本提取问题,只需一个sed表达式。