bash 如何在不排序的情况下删除两个文件之间的公共行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24324350/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove common lines between two files without sorting?
提问by harrison4
I have two files not sortered which have some lines in common.
我有两个未排序的文件,它们有一些共同点。
file1.txt
文件1.txt
Z
B
A
H
L
file2.txt
文件2.txt
S
L
W
Q
A
The way I'm using to remove common lines is the following:
我用来删除公共行的方式如下:
sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt
comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt
Output:
输出:
B
H
Z
The problem is that I want to keep the order of file1.txt, I mean:
问题是我想保持file1.txt的顺序,我的意思是:
Desired output:
期望的输出:
Z
B
H
One solution I tought is doing a loop to read all the lines of file2.txt and:
我想到的一个解决方案是循环读取 file2.txt 的所有行,并且:
sed -i '/^${line_file2}$/d' file1.txt
But if files are big the performance may suck.
但是如果文件很大,性能可能会很差。
- Do you like my idea?
- Do you have any alternative to do it?
- 你喜欢我的想法吗?
- 你有其他选择吗?
回答by Kent
grep or awk:
grep 或 awk:
awk 'NR==FNR{a[grep -vf input2 input1
]=1;next}!a[Z
B
H
]' file2 file1
回答by perreal
You can use just grep (-v
for invert, -f
for file). Grep lines from input1
that do not match any line in input2
:
您可以只使用 grep (-v
用于反转,-f
用于文件)。来自input1
该行的 Grep 行与 中的任何行都不匹配input2
:
#!/usr/bin/env perl -w
use strict;
use Getopt::Std;
my %opts;
getopts('hvfcmdk:', \%opts);
my $missing=$opts{m}||undef;
my $column=$opts{k}||undef;
my $common=$opts{c}||undef;
my $verbose=$opts{v}||undef;
my $fast=$opts{f}||undef;
my $dupes=$opts{d}||undef;
$missing=1 unless $common || $dupes;;
&usage() unless $ARGV[1];
&usage() if $opts{h};
my (%found,%k,%fields);
if ($column) {
die("The -k option only works in fast (-f) mode\n") unless $fast;
$column--; ## So I don't need to count from 0
}
open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
while(<$F1>){
chomp;
if ($fast){
my @aa=split(/\s+/,$_);
$k{$aa[0]}++;
$found{$aa[0]}++;
}
else {
$k{$_}++;
$found{$_}++;
}
}
close($F1);
my $n=0;
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
my $size=0;
if($verbose){
while(<F2>){
$size++;
}
}
close(F2);
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
while(<F2>){
next if /^\s+$/;
$n++;
chomp;
print STDERR "." if $verbose && $n % 10==0;
print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
if($fast){
my @aa=split(/\s+/,$_);
$k{$aa[0]}++ if defined($k{$aa[0]});
$fields{$aa[0]}=\@aa if $column;
}
else{
my @keys=keys(%k);
foreach my $key(keys(%found)){
if (/\Q$key/){
$k{$key}++ ;
$found{$key}=undef unless $dupes;
}
}
}
}
close(F2);
print STDERR "[$n of $size lines]\n" if $verbose;
if ($column) {
$missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
$common && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
$dupes && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
}
else {
$missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
$common && do map{print "$_\n" if $k{$_}>1}keys(%k);
$dupes && do map{print "$_\n" if $k{$_}>2}keys(%k);
}
sub usage{
print STDERR <<EndOfHelp;
USAGE: compare_lists.pl FILE1 FILE2
This script will compare FILE1 and FILE2, searching for the
contents of FILE1 in FILE2 (and NOT vice versa). FILE one must
be one search pattern per line, the search pattern need only be
contained within one of the lines of FILE2.
OPTIONS:
-c : Print patterns COMMON to both files
-f : Search only the first characters of each line of FILE2
for the search pattern given in FILE1
-d : Print duplicate entries
-m : Print patterns MISSING in FILE2 (default)
-h : Print this help and exit
EndOfHelp
exit(0);
}
Gives:
给出:
list_compare.pl -cf file1.txt file2.txt
回答by terdon
I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:
我写了一个小的 Perl 脚本,用于这种事情。它可以做的比你要求的更多,但它也可以做你需要的:
##代码##In your case, you would run it as
在你的情况下,你会运行它
##代码##The -f
option makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the -f
.
该-f
选项使其仅比较 file2 的第一个单词(由空格定义)并大大加快了速度。要比较整行,请删除-f
.