Linux 对制表符分隔的文件进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1037365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sorting a tab delimited file
提问by neversaint
I have a data with the following format:
我有以下格式的数据:
foo<tab>1.00<space>1.33<space>2.00<tab>3
Now I tried to sort the file based on the last field decreasingly. I tried the following commands but it wasn't sorted as we expected.
现在我尝试根据最后一个字段对文件进行递减排序。我尝试了以下命令,但没有按我们预期的那样排序。
$ sort -k3nr file.txt # apparently this sort by space as delimiter
$ sort -t"\t" -k3nr file.txt
sort: multi-character tab `\t'
$ sort -t "`/bin/echo '\t'`" -k3,3nr file.txt
sort: multi-character tab `\t'
What's the right way to do it?
正确的做法是什么?
Here is the sample data.
这是示例数据。
采纳答案by Lars Haugseth
Using bash, this will do the trick:
使用bash,这可以解决问题:
$ sort -t$'\t' -k3 -nr file.txt
Notice the dollar sign in front of the single-quoted string. You can read about it in the ANSI-C Quoting sections of the bashman page.
注意单引号字符串前面的美元符号。您可以在bash手册页的ANSI-C 引用部分阅读有关它的信息。
回答by Michiel Buddingh
pipe it through something like awk '{ print print $1"\t"$2"\t"$3"\t"$4"\t"$5 }'
. This will change the spaces to tabs.
通过诸如awk '{ print print $1"\t"$2"\t"$3"\t"$4"\t"$5 }'
. 这会将空格更改为制表符。
回答by laalto
By default the field delimiter is non-blank to blank transition so tab should work just fine.
默认情况下,字段分隔符是非空白到空白的过渡,因此制表符应该可以正常工作。
However, the columns are indexed base 1 and base 0 so you probably want
但是,这些列的索引基数为 1 和基数 0,因此您可能想要
sort -k4nr file.txt
to sort file.txt by column 4 numerically in reverse order. (Though the data in the question has even 5 fields so the last field would be index 5.)
以相反的顺序按第 4 列对 file.txt 进行数字排序。(尽管问题中的数据甚至有 5 个字段,因此最后一个字段将是索引 5。)
回答by James Thompson
In general keeping data like this is not a great thing to do if you can avoid it, because people are always confusing tabs and spaces.
一般来说,如果可以避免的话,保持这样的数据并不是一件好事,因为人们总是混淆制表符和空格。
Solving your problem is very straightforward in a scripting language like Perl, Python or Ruby. Here's some example code:
使用 Perl、Python 或 Ruby 等脚本语言解决您的问题非常简单。下面是一些示例代码:
#!/usr/bin/perl -w
use strict;
my $sort_field = 2;
my $split_regex = qr{\s+};
my @data;
push @data, "7 8\t 9";
push @data, "4 5\t 6";
push @data, "1 2\t 3";
my @sorted_data =
map { $_->[1] }
sort { $a->[0] <=> $b->[0] }
map { [ ( split $split_regex, $_ )[$sort_field], $_ ] }
@data;
print "unsorted\n";
print join "\n", @data, "\n";
print "sorted by $sort_field, lines split by $split_regex\n";
print join "\n", @sorted_data, "\n";
回答by Lloyd
The $ solution didn't work for me. However, By actually putting the tab character itself in the command did: sort -t'' -k2
$ 解决方案对我不起作用。但是,通过实际将制表符本身放在命令中确实做到了:sort -t'' -k2
回答by Lawrence Noronha
I wanted a solution for Gnu sort on Windows, but none of the above solutions worked for me on the command line.
我想要一个 Windows 上 Gnu sort 的解决方案,但上述解决方案都不适用于我的命令行。
Using Lloyd's clue, the following batch file (.bat) worked for me.
使用劳埃德的线索,以下批处理文件 (.bat) 对我有用。
Type the tab character within the double quotes.
在双引号内键入制表符。
C:\>cat foo.bat
sort -k3 -t" " tabfile.txt
回答by Danny
I was having this problem with sort in cygwin in a bash shell when using 'general-numeric-sort'. If I specified -t$'\t' -kFg
, where F is the field number, it didn't work, but when I specified both -t$'\t'
and -kF,Fg
(e.g -k7,7g
for the 7th field) it did work. -kF,Fg
without the -t$'\t'
did not work.
使用“通用数字排序”时,我在 bash shell 中的 cygwin 中遇到了这个问题。如果我指定了-t$'\t' -kFg
,其中 F 是字段编号,则它不起作用,但是当我同时指定-t$'\t'
和-kF,Fg
(例如,-k7,7g
对于第 7 个字段)时,它确实起作用了。-kF,Fg
没有-t$'\t'
没有工作。
回答by Brian Carlsen
You need to put an actual tab character after the -t\ and to do that in a shell you hit ctrl-v and then the tab character. Most shells I've used support this mode of literal tab entry.
您需要在 -t\ 之后放置一个实际的制表符,并在 shell 中执行此操作,您可以按 ctrl-v 然后按制表符。我使用过的大多数 shell 都支持这种文字制表符输入模式。
Beware, though, because copying and pasting from another place generally does not preserve tabs.
但是要小心,因为从另一个地方复制和粘贴通常不会保留选项卡。
回答by The Unfun Cat
If you want to make it easier for yourself by only having tabs, replace the spaces with tabs:
如果您想通过仅使用制表符来使自己更轻松,请将空格替换为制表符:
tr " " "\t" < <file> | sort <options>