Linux 使用 Unix 排序对多个键进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/357560/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 16:47:55  来源:igfitidea点击:

Sorting multiple keys with Unix sort

linuxunixsorting

提问by Chris Kloberdanz

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

我有可能需要按 1-n 个键排序的大文件。其中一些键可能是数字,而其中一些可能不是。这是一个固定宽度的柱状文件,因此没有分隔符。

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

有没有一种用 Unix 排序来做到这一点的好方法?一键操作就像使用“-n”一样简单。我已经阅读了手册页并简要搜索了谷歌,但没有找到一个很好的例子。我将如何实现这一目标?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

注意:由于文件大小的潜力,我已经排除了 Perl。这将是最后的手段。

采纳答案by Ken Gentle

Use the -koption (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as nfor numeric sort)

使用-k选项(或--key=POS1[,POS2])。它可以出现多次并且每个键可以有全局选项(例如n用于数字排序)

回答by Clinton Pierce

The -k option is what you want.

-k 选项就是你想要的。

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

将在第一个字段中使用字符位置 4-5(固定宽度都是一个字段)并按数字排序作为第一个键。

The second key would be characters 14-15 in the first field also.

第二个键也是第一个字段中的字符 14-15。

(edit)

(编辑)

Example (all I have is DOS/cygwin handy):

示例(我所拥有的只是 DOS/cygwin 方便):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

对于数据:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

按月份编号 (pos 4-5) 对目录列表进行数字排序,然后按文件名 (pos 40-60) 反向排序。由于没有选项卡,所有字段 1 都需要排序。

回答by Dong Hoon

I believe in your case something like

我相信你的情况类似

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

会工作得更好。@ 是字段分隔符,请确保它是一个无处可见的字符。那么您的输入被视为由一列组成。

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.

编辑:显然 clintp 已经给出了类似的答案,抱歉。正如他指出的那样,标志 'n' 和 'r' 可以添加到每个 -k.... 选项中。

回答by andras

Take care though:

不过要小心:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

如果您想主要按字段 3 对文件进行排序,然后按字段 2 对文件进行排序,则您需要:

sort -k 3,3 -k 2,2 < inputfile

Not this:sort -k 3 -k 2 < inputfilewhich sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

不是这样:sort -k 3 -k 2 < inputfile它按从字段 3 的开头到行尾(可能是唯一的)的字符串对文件进行排序。

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)

回答by ron

Note that is may also be desired to stabilize the sort with the -sswitch, so that equally ranked lines maintain their original relative order in the output too.

请注意,也可能需要使用-s开关来稳定排序,以便同样排名的行在输出中也保持其原始相对顺序。

回答by jianpx

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.

我只想添加一些提示,当您使用 sort 时,请注意影响键比较顺序的语言环境。我通常明确使用 LC_ALL=C 来制作我想要的语言环境。

回答by edW

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

这是一个按数字和字典顺序对 csv 文件中的各个列进行排序的方法,第 5 列和之后的列作为字典顺序

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

请注意 -k1,1n 表示从第 1 列开始到第 1 列结束的数字。如果我在下面完成,它将连接第 1 列和第 2 列,使 1,10 排序为 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga