bash unix 排序,带有主键和辅助键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3193720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:17:49  来源:igfitidea点击:

unix sorting, with primary and secondary keys

bashunixsorting

提问by zseder

I would like to sort a file on more fields. A sample tab separated file is:

我想在更多字段上对文件进行排序。示例制表符分隔文件是:

a   1   1.0
b   2   0.1
c   3   0.3
a   4   0.001
c   5   0.5
a   6   0.01
b   7   0.01
a   8   0.35
b   9   2.3
c   10  0.1
c   11  1.0
b   12  3.1
a   13  2.1

And i would like to have it sorted alphabetically by field 1 (with -d), and when field1 is the same, sort by field 3 (with the -goption).

我希望它按字段 1(带-d)的字母顺序排序,当字段 1 相同时,按字段 3(带-g选项)排序。

A didn't succeed in doing this. My attemps were (with a real TAB character instead of <TAB>):

A没有成功做到这一点。我的尝试是(使用真正的 TAB 字符而不是<TAB>):

cat tst | sort -t"<TAB>" -k1 -k3n
cat tst | sort -t"<TAB>" -k1d -k3n
cat tst | sort -t"<TAB>" -k3n -k1d

None of these are working. I'm not sure if sort is even able to do this. I'll write a script for workaround, so I'm just curious whether there is a solution using only sort.

这些都不起作用。我不确定 sort 是否能够做到这一点。我将编写一个解决方法的脚本,所以我很好奇是否有仅使用sort的解决方案。

采纳答案by Janick Bernet

The manualshows some examples.

手册显示了一些示例。

In accordance with zseder's comment, this works:

根据 zseder 的评论,这有效:

sort -t"<TAB>" -k1,1d -k3,3g

Tab should theoretically work also like this sort -t"\t".

Tab 理论上也应该像这样工作sort -t"\t"

If none of the above work to delimit by tab, this is an ugly workaround:

如果以上方法都不能按制表符分隔,这是一个丑陋的解决方法:

TAB=`echo -e "\t"`
sort -t"$TAB"

回答by Philipp

Here is a Python script that you might use as a starting point:

这是一个 Python 脚本,您可以将其用作起点:

#!/usr/bin/env python2.6

import sys
import string

def main():
    fname = sys.argv[1]
    data = []
    with open(fname, "rt") as stream:
        for line in stream:
            line = line.strip()
            a, b, c = line.split()
            data.append((a, int(b), float(c)))
    data.sort(key=my_key)
    print data


def my_key(item):
    a, b, c = item
    return c, lexicographical_key(a)


def lexicographical_key(a):
    # poor man's attempt, should use Unicode classification etc.
    return a.translate(None, string.punctuation)


if __name__ == "__main__":
    main()