Linux shell 中的排序和 uniq

Question

提问by yassin

What is the difference between the following to commands?

以下命令有什么区别？

sort -u FILE

sort FILE | uniq

Answer 1

采纳答案by Jonathan Leffler

Using sort -udoes less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sorthas to create intermediate files, there's a decent chance that sort -uwill use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).

Using 的sort -uI/O 比少sort | uniq，但最终结果是一样的。特别是，如果文件足够大以至于sort必须创建中间文件，那么sort -u很有可能会使用略少或略小的中间文件，因为它可以在对每个集合进行排序时消除重复项。如果数据高度重复，这可能是有益的；如果实际上几乎没有重复，则不会产生太大差异（与管道的一阶效应相比，绝对是二阶性能效应）。

Note that there times when the piping is appropriate. For example:

请注意，有时管道是合适的。例如：

sort FILE | uniq -c | sort -n

This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)

这将文件按文件中每一行出现的次数排序，最重复的行出现在最后。（我不会惊讶地发现，这种组合是 Unix 或 POSIX 惯用的，可以用 GNU sort 压缩成一个复杂的“sort”命令。）

There are times when not using the pipe is important. For example:

有时不使用管道很重要。例如：

sort -u -o FILE FILE

This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).

这将“原位”对文件进行排序；即输出文件由指定-o FILE，并且保证此操作安全（文件在被覆盖输出之前被读取）。

Answer 2

回答by Jauzsika

Nothing, they will produce the same result

没什么，它们会产生相同的结果

Answer 3

回答by knittl

sort -uwill be slightly faster, because it does not need to pipe the output between two commands

sort -u会稍微快一点，因为它不需要在两个命令之间通过管道传输输出

also see my question on the topic: calling uniq and sort in different orders in shell

另请参阅我关于该主题的问题：在 shell 中以不同的顺序调用 uniq 和排序

Answer 4

回答by P Shved

There is one slight difference: return code.

有一个细微的区别：返回码。

The thing is that unless shopt -o pipefailis set the return code of the piped command will be return code of the last one. And uniqalways returns zero (success). Try examining exit code, and you'll see something like this (pipefailis not set here):

问题是，除非shopt -o pipefail设置了管道命令的返回码，否则将是最后一个的返回码。并且uniq总是返回零（成功）。尝试检查退出代码，您会看到类似这样的内容（pipefail此处未设置）：

pavel@lonely ~ $ sort -u file_that_doesnt_exist ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
2
pavel@lonely ~ $ sort file_that_doesnt_exist | uniq ; echo $?
sort: open failed: file_that_doesnt_exist: No such file or directory
0

Other than this, the commands are equivalent.

除此之外，命令是等效的。

Answer 5

回答by Hemant

I have worked on some servers where sort don't support '-u' option. there we have to use

我曾在一些排序不支持“-u”选项的服务器上工作。我们必须使用

sort xyz | uniq

Answer 6

回答by willdye

Beware! While it's true that "sort -u" and "sort|uniq" are equivalent, any additional options to sort can break the equivalence. Here's an example from the coreutils manual:

谨防！虽然“sort -u”和“sort|uniq”是等价的，但任何额外的排序选项都可能破坏等价。这是 coreutils 手册中的一个示例：

For example, 'sort -n -u' inspects only the value of the initial numeric string when checking for uniqueness, whereas 'sort -n | uniq' inspects the entire line.

例如，'sort -n -u' 在检查唯一性时只检查初始数字字符串的值，而 'sort -n | uniq' 检查整行。

Similarly, if you sort on key fields, the uniqueness test used by sort won't necessarily look at the entire line anymore. After being bitten by that bug in the past, these days I tend to use "sort|uniq" when writing Bash scripts. I'd rather have higher I/O overhead than run the risk that someone else in the shop won't know about that particular pitfall when they modify my code to add additional sort parameters.

同样，如果您对关键字段进行排序， sort 使用的唯一性测试将不再需要查看整行。在过去被那个 bug 咬伤之后，现在我在编写 Bash 脚本时倾向于使用“sort|uniq”。我宁愿有更高的 I/O 开销，也不愿冒着店里的其他人在修改我的代码以添加其他排序参数时不知道该特定陷阱的风险。

Linux shell 中的排序和 uniq

提问by yassin

采纳答案by Jonathan Leffler

回答by Jauzsika

回答by knittl

回答by P Shved

回答by Hemant

回答by willdye

相关推荐

最近更新

标签

Linux shell 中的排序和 uniq

提问by yassin

采纳答案by Jonathan Leffler

回答by Jauzsika

回答by knittl

回答by P Shved

回答by Hemant

回答by willdye

相关推荐

如何在linux中找出当前使用的MySQL配置文件的位置

如何将月份名称（字符串）解析为整数以在 C# 中进行比较？

如何找出目录或文件所在的挂载/分区？(Linux 服务器)

C# 设置线程标识

相关推荐

最近更新

标签