Linux 使用 Unix 排序对多个键进行排序

Question

提问by Chris Kloberdanz

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

我有可能需要按 1-n 个键排序的大文件。其中一些键可能是数字，而其中一些可能不是。这是一个固定宽度的柱状文件，因此没有分隔符。

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

有没有一种用 Unix 排序来做到这一点的好方法？一键操作就像使用“-n”一样简单。我已经阅读了手册页并简要搜索了谷歌，但没有找到一个很好的例子。我将如何实现这一目标？

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

注意：由于文件大小的潜力，我已经排除了 Perl。这将是最后的手段。

Answer 1

采纳答案by Ken Gentle

Use the -koption (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as nfor numeric sort)

使用-k选项（或--key=POS1[,POS2]）。它可以出现多次并且每个键可以有全局选项（例如n用于数字排序）

Answer 2

回答by Clinton Pierce

The -k option is what you want.

-k 选项就是你想要的。

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

将在第一个字段中使用字符位置 4-5（固定宽度都是一个字段）并按数字排序作为第一个键。

The second key would be characters 14-15 in the first field also.

第二个键也是第一个字段中的字符 14-15。

(edit)

（编辑）

Example (all I have is DOS/cygwin handy):

示例（我所拥有的只是 DOS/cygwin 方便）：

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

对于数据：

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

按月份编号 (pos 4-5) 对目录列表进行数字排序，然后按文件名 (pos 40-60) 反向排序。由于没有选项卡，所有字段 1 都需要排序。

Answer 3

回答by Dong Hoon

I believe in your case something like

我相信你的情况类似

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

会工作得更好。@ 是字段分隔符，请确保它是一个无处可见的字符。那么您的输入被视为由一列组成。

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.

编辑：显然 clintp 已经给出了类似的答案，抱歉。正如他指出的那样，标志 'n' 和 'r' 可以添加到每个 -k.... 选项中。

Answer 4

回答by andras

Take care though:

不过要小心：

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

如果您想主要按字段 3 对文件进行排序，然后按字段 2 对文件进行排序，则您需要：

sort -k 3,3 -k 2,2 < inputfile

Not this:sort -k 3 -k 2 < inputfilewhich sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

不是这样：sort -k 3 -k 2 < inputfile它按从字段 3 的开头到行尾（可能是唯一的）的字符串对文件进行排序。

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)

Answer 5

回答by ron

Note that is may also be desired to stabilize the sort with the -sswitch, so that equally ranked lines maintain their original relative order in the output too.

请注意，也可能需要使用-s开关来稳定排序，以便同样排名的行在输出中也保持其原始相对顺序。

Answer 6

回答by jianpx

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.

我只想添加一些提示，当您使用 sort 时，请注意影响键比较顺序的语言环境。我通常明确使用 LC_ALL=C 来制作我想要的语言环境。

Answer 7

回答by edW

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

这是一个按数字和字典顺序对 csv 文件中的各个列进行排序的方法，第 5 列和之后的列作为字典顺序

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

请注意 -k1,1n 表示从第 1 列开始到第 1 列结束的数字。如果我在下面完成，它将连接第 1 列和第 2 列，使 1,10 排序为 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga

Linux 使用 Unix 排序对多个键进行排序

提问by Chris Kloberdanz

采纳答案by Ken Gentle

回答by Clinton Pierce

回答by Dong Hoon

回答by andras

回答by ron

回答by jianpx

回答by edW

相关推荐

最近更新

标签

Linux 使用 Unix 排序对多个键进行排序

提问by Chris Kloberdanz

采纳答案by Ken Gentle

回答by Clinton Pierce

回答by Dong Hoon

回答by andras

回答by ron

回答by jianpx

回答by edW

相关推荐

C#中的大整数

如何防止在 Linux 中关闭 SSH 客户端后后台进程停止

如何在 C# 中按进程获取打开的文件句柄列表？

如何确保应用程序在 Linux 上持续运行

相关推荐

最近更新

标签