Linux 使用 Unix 排序对多个键进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/357560/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sorting multiple keys with Unix sort
提问by Chris Kloberdanz
I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.
我有可能需要按 1-n 个键排序的大文件。其中一些键可能是数字,而其中一些可能不是。这是一个固定宽度的柱状文件,因此没有分隔符。
Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?
有没有一种用 Unix 排序来做到这一点的好方法?一键操作就像使用“-n”一样简单。我已经阅读了手册页并简要搜索了谷歌,但没有找到一个很好的例子。我将如何实现这一目标?
Note: I have ruled out Perl because of the file size potential. It would be a last resort.
注意:由于文件大小的潜力,我已经排除了 Perl。这将是最后的手段。
采纳答案by Ken Gentle
Use the -k
option (or --key=POS1[,POS2]
). It can appear multiple times and each key can have global options (such as n
for numeric sort)
使用-k
选项(或--key=POS1[,POS2]
)。它可以出现多次并且每个键可以有全局选项(例如n
用于数字排序)
回答by Clinton Pierce
The -k option is what you want.
-k 选项就是你想要的。
-k 1.4,1.5n -k 1.14,1.15n
Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.
将在第一个字段中使用字符位置 4-5(固定宽度都是一个字段)并按数字排序作为第一个键。
The second key would be characters 14-15 in the first field also.
第二个键也是第一个字段中的字符 14-15。
(edit)
(编辑)
Example (all I have is DOS/cygwin handy):
示例(我所拥有的只是 DOS/cygwin 方便):
dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r
for the data:
对于数据:
12/10/2008 01:10 PM 1,564,990 outfile.txt
Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.
按月份编号 (pos 4-5) 对目录列表进行数字排序,然后按文件名 (pos 40-60) 反向排序。由于没有选项卡,所有字段 1 都需要排序。
回答by Dong Hoon
I believe in your case something like
我相信你的情况类似
sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile
will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.
会工作得更好。@ 是字段分隔符,请确保它是一个无处可见的字符。那么您的输入被视为由一列组成。
Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.
编辑:显然 clintp 已经给出了类似的答案,抱歉。正如他指出的那样,标志 'n' 和 'r' 可以添加到每个 -k.... 选项中。
回答by andras
Take care though:
不过要小心:
If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:
如果您想主要按字段 3 对文件进行排序,然后按字段 2 对文件进行排序,则您需要:
sort -k 3,3 -k 2,2 < inputfile
Not this:sort -k 3 -k 2 < inputfile
which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).
不是这样:sort -k 3 -k 2 < inputfile
它按从字段 3 的开头到行尾(可能是唯一的)的字符串对文件进行排序。
-k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2
(default end of line)
回答by ron
Note that is may also be desired to stabilize the sort with the -s
switch, so that equally ranked lines maintain their original relative order in the output too.
请注意,也可能需要使用-s
开关来稳定排序,以便同样排名的行在输出中也保持其原始相对顺序。
回答by jianpx
I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.
我只想添加一些提示,当您使用 sort 时,请注意影响键比较顺序的语言环境。我通常明确使用 LC_ALL=C 来制作我想要的语言环境。
回答by edW
Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order
这是一个按数字和字典顺序对 csv 文件中的各个列进行排序的方法,第 5 列和之后的列作为字典顺序
~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga
Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110
请注意 -k1,1n 表示从第 1 列开始到第 1 列结束的数字。如果我在下面完成,它将连接第 1 列和第 2 列,使 1,10 排序为 110
~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga