bash 如何使“cut”命令将相同的连续分隔符视为一个？

Question

提问by mbaitoff

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cutcommand in the following manner:

我正在尝试从基于列的、“空格”调整的文本流中提取某个（第四个）字段。我正在尝试以cut下列方式使用该命令：

cat text.txt | cut -d " " -f 4

Unfortunately, cutdoesn't treat several spaces as one delimiter. I could have piped through awk

不幸的是，cut不会将多个空格视为一个分隔符。我可以通过 awk 进行管道传输

awk '{ printf $4; }'

or sed

或 sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cutand several delimiters natively?

折叠空格，但我想知道是否有任何方法可以处理cut本地的几个分隔符？

Answer 1

回答by kev

Try:

尝试：

tr -s ' ' <text.txt | cut -d ' ' -f4

From the trman page:

从tr手册页：

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character

Answer 2

回答by fedorqui 'SO stop harming'

As you comment in your question, awkis really the way to go. To use cutis possible together with tr -sto squeeze spaces, as kev's answershows.

当您在问题中发表评论时，awk确实是要走的路。要使用cut是可能的共同tr -s挤压的空间，如千电子伏的答案节目。

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

然而，让我为未来的读者介绍所有可能的组合。解释在测试部分。

tr | cut

tr | 切

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print }' file

bash

猛击

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*//' file

Tests

测试

Given this file, let's test the commands:

给定这个文件，让我们测试命令：

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

tr | 切

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print }' a
1
2
3
4

bash

猛击

This reads the fields sequentially. By using _we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfieldas the 4th field in the file, no matter the spaces in between them.

这会按顺序读取字段。通过使用_我们表明这是一个一次性变量作为“垃圾变量”以忽略这些字段。这样，$myfield无论它们之间有空格，我们都将其存储为文件中的第 4 个字段。

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

这捕获了三组空格，没有空格([^ ]*[ ]*){3}。然后，它捕获任何到达空格作为第 4 个字段的内容，最后用\1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*//' a
1
2
3
4

Answer 3

回答by arielf

shortest/friendliest solution

最短/最友好的解决方案

After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cutsfor "cut on steroids".

之后变得具有的太多局限性沮丧cut，我写我自己更换，我把它叫做cuts为“切类固醇”。

cutsprovides what is likely the most minimalist solution to this and many otherrelated cut/paste problems.

剪切为这个问题和许多其他相关的剪切/粘贴问题提供了可能是最简单的解决方案。

One example, out of many, addressing this particular question:

在众多示例中，有一个解决此特定问题的示例：

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cutssupports:

cuts支持：

auto-detection of most common field-delimiters in files (+ ability to override defaults)
multi-char, mixed-char, and regex matched delimiters
extracting columns from multiple files with mixed delimiters
offsets from end of line (using negative numbers) in addition to start of line
automatic side-by-side pasting of columns (no need to invoke pasteseparately)
support for field reordering
a config file where users can change their personal preferences
great emphasis on user friendliness & minimalist required typing

自动检测文件中最常见的字段分隔符（+ 覆盖默认值的能力）
多字符、混合字符和正则表达式匹配的分隔符
从具有混合分隔符的多个文件中提取列
除了行首之外，从行尾偏移（使用负数）
自动并排粘贴列（无需paste单独调用）
支持字段重新排序
一个配置文件，用户可以在其中更改他们的个人偏好
非常重视用户友好性和极简要求的打字

and much more. None of which is provided by standard cut.

以及更多。标准cut.

See also: https://stackoverflow.com/a/24543231/1296044

另见：https: //stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/

源代码和文档（免费软件）：http: //arielf.github.io/cuts/

Answer 4

回答by Chris Koknat

This Perl one-liner shows how closely Perl is related to awk:

这个 Perl 单行显示了 Perl 与 awk 的关系：

perl -lane 'print $F[3]' text.txt

However, the @Fautosplit array starts at index $F[0]while awk fields start with $1

但是，自动@F拆分数组从索引开始，$F[0]而 awk 字段以$1

Answer 5

回答by Benoit

With versions of cutI know of, no, this is not possible. cutis primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

使用cut我所知道的版本，不，这是不可能的。cut主要用于解析分隔符不是空格（例如/etc/passwd）并且具有固定数量字段的文件。一行中的两个分隔符表示一个空字段，这也适用于空白。

bash 如何使“cut”命令将相同的连续分隔符视为一个？

提问by mbaitoff

回答by kev

回答by fedorqui 'SO stop harming'

tr | cut

tr | 切

awk

awk

bash

猛击

sed

sed

Tests

测试

tr | cut

tr | 切

awk

awk

bash

猛击

sed

sed

回答by arielf

shortest/friendliest solution

最短/最友好的解决方案

回答by Chris Koknat

回答by Benoit

相关推荐

最近更新

标签

bash 如何使“cut”命令将相同的连续分隔符视为一个？

提问by mbaitoff

回答by kev

回答by fedorqui 'SO stop harming'

tr | cut

tr | 切

awk

awk

bash

猛击

sed

sed

Tests

测试

tr | cut

tr | 切

awk

awk

bash

猛击

sed

sed

回答by arielf

shortest/friendliest solution

最短/最友好的解决方案

回答by Chris Koknat

回答by Benoit

相关推荐

bash 如何在bash中回显包含未转义美元符号的变量

bash 关键字“if”如何测试一个值是真还是假？

bash cat将多个文件内容转换为没有换行符的单个字符串

bash IF 语句中的 grep

相关推荐

最近更新

标签