bash 加入发出警告“文件 1 未按排序顺序”

Question

提问by Rudy

Was testing a legacy script in the new version of bash 4.1.2(1)-release , and encountered this warning in the console:

在新版本的 bash 4.1.2(1)-release 中测试遗留脚本时，在控制台中遇到此警告：

join: file 1 is not in sorted order
join: file 2 is not in sorted order

I am quite sure that both of the files are sorted. The files actually merged properly.

我很确定这两个文件都已排序。这些文件实际上已正确合并。

Below is the script:

下面是脚本：

cat $FILE1_PATH'.processed.1' | cut -d'|' -f4,8 | sort | uniq -u  > $FILE1_PATH.'processed.2'
cat $FILE2_PATH'.processed.1' | cut -d'|' -f1,8 | sort | uniq -u > $FILE2_PATH.'processed.2'
join -t$'|' -1 1 -2 1 $FILE1_PATH.'processed.2' $FILE2_PATH.'processed.2' > $MERGEFILE_PATH

To job of this script :

这个脚本的工作：

extract field 4 and 8 from file 1
extract field 1 and 8 from file 2
combine the extracted fields, using join key file1.field4 = file2.field1
remove any duplicates.

从文件 1 中提取字段 4 和 8
从文件 2 中提取字段 1 和 8
使用连接键 file1.field4 = file2.field1 组合提取的字段
删除任何重复项。

FILE1.processed.2 :

FILE1.processed.2 ：

21VIANET GP INC|GOV
ABN|ABN1
ABN|ABN2
ABOC|ABOC1
ABOC|ABOC1
ABOC|ABOC2
....

FILE2.processed.2 :

FILE2.processed.2 ：

ABN|Banks
ABOC|Pharmaceuticals
GOV|Government Agency 
....

OUTPUT:

输出：

GOV|21VIANET GP INC|Government Agency
ABN|ABN1|Banks
ABN|ABN2|Banks
ABOC|ABOC1|Pharmaceuticals
ABOC|ABOC2|Pharmaceuticals  
....

Running the same script in the bash version 3.2.25(1)-release gives no warning. Any idea to solve the warning?

在 bash 版本 3.2.25(1)-release 中运行相同的脚本不会发出警告。任何想法来解决警告？

UPDATE: Seems that the cause was caused by these lines in the input files...

更新：似乎原因是由输入文件中的这些行引起的......

ADBC|Banks 
ADB|Banks

Join expects ADBC to be positioned after ADB, like below :

Join 期望 ADBC 位于 ADB 之后，如下所示：

ADB|Banks
ADBC|Banks

However I tried to change my sort script from sort -u to sort -t$'|' -k1 (sort based on the first field ) however still not working...

但是我尝试将排序脚本从 sort -u 更改为 sort -t$'|' -k1（根据第一个字段排序）但是仍然无法正常工作...

Answer 1

回答by

The suggestion in the joinman page is to use sort -k 1b,1when you're joining on field 1. (It says "when join has no options" but as far as field selection is concerned, your join is equivalent to no options. -1 1and -2 1are the defaults.) You can add -t '|'to that and it will match your joinperfectly.

join手册页中的建议是sort -k 1b,1在您加入字段 1 时使用。（它说“当加入没有选项时”但就字段选择而言，您的加入相当于没有选项。-1 1并且-2 1是默认值。 ) 你可以添加-t '|'它，它会join完美匹配你的。

-k1means all fields from 1 to the end. -k1,1means just field 1. The bis necessary if you have leading whitespace and want to ignore it. sort syntax is weird. And this is afterPOSIX redesigned it to try to make it sensible. If you ever write a sort command that doesn't look complicated, it's probably not doing what you wanted.

-k1表示从 1 到结尾的所有字段。-k1,1意味着只是字段 1。b如果您有前导空格并想忽略它，这是必要的。排序语法很奇怪。这是在POSIX 重新设计它以使其变得合理之后。如果您曾经编写过一个看起来并不复杂的排序命令，那么它可能没有执行您想要的操作。

Add --debugto your sort command to see what it's using as a key. With a sample file containing these lines:

添加--debug到您的排序命令以查看它用作键的内容。使用包含这些行的示例文件：

ADBC|Banks
ADB|Banks
 ADBC|Banks

you can see the effect of various -koptions:

您可以看到各种-k选项的效果：

$ sort -s -t '|' -k 1 --debug file
sort: using simple byte comparison
 ADBC|Banks
___________
ADBC|Banks
__________
ADB|Banks
_________
$ sort -s -t '|' -k 1,1 --debug file
sort: using simple byte comparison
 ADBC|Banks
_____
ADB|Banks
___
ADBC|Banks
____
$ sort -s -t '|' -k 1b,1 --debug file
sort: using simple byte comparison
ADB|Banks
___
ADBC|Banks
____
 ADBC|Banks
 ____

Now you're probably wondering about the -sI threw in there. Without it, there is a default last-resort comparison of the whole line as a string, which applies to lines with equal keys. That's not normally a problem and you probably don't need to use -s. It's just that when using --debug, the last-resort comparison clutters the list so I like to use -sto get rid of it.

现在你可能想知道-s我扔在那里的东西。如果没有它，则将整行作为字符串进行默认的最后比较，这适用于具有相同键的行。这通常不是问题，您可能不需要使用-s. 只是在使用时--debug，最后的比较使列表变得混乱，所以我喜欢使用-s它来摆脱它。

bash 加入发出警告“文件 1 未按排序顺序”

提问by Rudy

回答by

相关推荐

最近更新

标签

bash 加入发出警告“文件 1 未按排序顺序”

提问by Rudy

回答by

相关推荐

bash grep，否则打印不匹配的消息

bash 使用 socat 进行原始串行连接

bash 目标目录不存在时如何创建符号链接？

bash linux bash脚本解压到同一目录

相关推荐

最近更新

标签