bash 如何使“cut”命令将相同的连续分隔符视为一个?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4143252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make the 'cut' command treat same sequental delimiters as one?
提问by mbaitoff
I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut
command in the following manner:
我正在尝试从基于列的、“空格”调整的文本流中提取某个(第四个)字段。我正在尝试以cut
下列方式使用该命令:
cat text.txt | cut -d " " -f 4
cat text.txt | cut -d " " -f 4
Unfortunately, cut
doesn't treat several spaces as one delimiter. I could have piped through awk
不幸的是,cut
不会将多个空格视为一个分隔符。我可以通过 awk 进行管道传输
awk '{ printf $4; }'
awk '{ printf $4; }'
or sed
或 sed
sed -E "s/[[:space:]]+/ /g"
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut
and several delimiters natively?
折叠空格,但我想知道是否有任何方法可以处理cut
本地的几个分隔符?
回答by kev
Try:
尝试:
tr -s ' ' <text.txt | cut -d ' ' -f4
From the tr
man page:
从tr
手册页:
-s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
回答by fedorqui 'SO stop harming'
As you comment in your question, awk
is really the way to go. To use cut
is possible together with tr -s
to squeeze spaces, as kev's answershows.
当您在问题中发表评论时,awk
确实是要走的路。要使用cut
是可能的共同tr -s
挤压的空间,如千电子伏的答案节目。
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
然而,让我为未来的读者介绍所有可能的组合。解释在测试部分。
tr | cut
tr | 切
tr -s ' ' < file | cut -d' ' -f4
awk
awk
awk '{print }' file
bash
猛击
while read -r _ _ _ myfield _
do
echo "forth field: $myfield"
done < file
sed
sed
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*//' file
Tests
测试
Given this file, let's test the commands:
给定这个文件,让我们测试命令:
$ cat a
this is line 1 more text
this is line 2 more text
this is line 3 more text
this is line 4 more text
tr | cut
tr | 切
$ cut -d' ' -f4 a
is
# it does not show what we want!
$ tr -s ' ' < a | cut -d' ' -f4
1
2 # this makes it!
3
4
$
awk
awk
$ awk '{print }' a
1
2
3
4
bash
猛击
This reads the fields sequentially. By using _
we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield
as the 4th field in the file, no matter the spaces in between them.
这会按顺序读取字段。通过使用_
我们表明这是一个一次性变量作为“垃圾变量”以忽略这些字段。这样,$myfield
无论它们之间有空格,我们都将其存储为文件中的第 4 个字段。
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4
sed
sed
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}
. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1
.
这捕获了三组空格,没有空格([^ ]*[ ]*){3}
。然后,它捕获任何到达空格作为第 4 个字段的内容,最后用\1
.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*//' a
1
2
3
4
回答by arielf
shortest/friendliest solution
最短/最友好的解决方案
After becoming frustrated with the too many limitations of cut
, I wrote my own replacement, which I called cuts
for "cut on steroids".
之后变得具有的太多局限性沮丧cut
,我写我自己更换,我把它叫做cuts
为“切类固醇”。
cutsprovides what is likely the most minimalist solution to this and many otherrelated cut/paste problems.
剪切为这个问题和许多其他相关的剪切/粘贴问题提供了可能是最简单的解决方案。
One example, out of many, addressing this particular question:
在众多示例中,有一个解决此特定问题的示例:
$ cat text.txt
0 1 2 3
0 1 2 3 4
$ cuts 2 text.txt
2
2
cuts
supports:
cuts
支持:
- auto-detection of most common field-delimiters in files (+ ability to override defaults)
- multi-char, mixed-char, and regex matched delimiters
- extracting columns from multiple files with mixed delimiters
- offsets from end of line (using negative numbers) in addition to start of line
- automatic side-by-side pasting of columns (no need to invoke
paste
separately) - support for field reordering
- a config file where users can change their personal preferences
- great emphasis on user friendliness & minimalist required typing
- 自动检测文件中最常见的字段分隔符(+ 覆盖默认值的能力)
- 多字符、混合字符和正则表达式匹配的分隔符
- 从具有混合分隔符的多个文件中提取列
- 除了行首之外,从行尾偏移(使用负数)
- 自动并排粘贴列(无需
paste
单独调用) - 支持字段重新排序
- 一个配置文件,用户可以在其中更改他们的个人偏好
- 非常重视用户友好性和极简要求的打字
and much more. None of which is provided by standard cut
.
以及更多。标准cut
.
See also: https://stackoverflow.com/a/24543231/1296044
另见:https: //stackoverflow.com/a/24543231/1296044
Source and documentation (free software): http://arielf.github.io/cuts/
源代码和文档(免费软件):http: //arielf.github.io/cuts/
回答by Chris Koknat
This Perl one-liner shows how closely Perl is related to awk:
这个 Perl 单行显示了 Perl 与 awk 的关系:
perl -lane 'print $F[3]' text.txt
However, the @F
autosplit array starts at index $F[0]
while awk fields start with $1
但是,自动@F
拆分数组从索引开始,$F[0]
而 awk 字段以$1
回答by Benoit
With versions of cut
I know of, no, this is not possible. cut
is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd
) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.
使用cut
我所知道的版本,不,这是不可能的。cut
主要用于解析分隔符不是空格(例如/etc/passwd
)并且具有固定数量字段的文件。一行中的两个分隔符表示一个空字段,这也适用于空白。