bash 如何在不排序的情况下删除两个文件之间的公共行？

Question

提问by harrison4

I have two files not sortered which have some lines in common.

我有两个未排序的文件，它们有一些共同点。

file1.txt

文件1.txt

Z
B
A
H
L

file2.txt

文件2.txt

S
L
W
Q
A

The way I'm using to remove common lines is the following:

我用来删除公共行的方式如下：

sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt

comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt

Output:

输出：

B
H
Z

The problem is that I want to keep the order of file1.txt, I mean:

问题是我想保持file1.txt的顺序，我的意思是：

Desired output:

期望的输出：

Z
B
H

One solution I tought is doing a loop to read all the lines of file2.txt and:

我想到的一个解决方案是循环读取 file2.txt 的所有行，并且：

sed -i '/^${line_file2}$/d' file1.txt

But if files are big the performance may suck.

但是如果文件很大，性能可能会很差。

Do you like my idea?
Do you have any alternative to do it?

你喜欢我的想法吗？
你有其他选择吗？

Answer 1

回答by Kent

grep or awk:

grep 或 awk：

awk 'NR==FNR{a[grep -vf input2 input1 
]=1;next}!a[Z
B
H
]' file2 file1

Answer 2

回答by perreal

You can use just grep (-vfor invert, -ffor file). Grep lines from input1that do not match any line in input2:

您可以只使用 grep （-v用于反转，-f用于文件）。来自input1该行的 Grep 行与中的任何行都不匹配input2：

#!/usr/bin/env perl -w
use strict;
use Getopt::Std;
my %opts;
getopts('hvfcmdk:', \%opts);
my $missing=$opts{m}||undef;
my $column=$opts{k}||undef;
my $common=$opts{c}||undef;
my $verbose=$opts{v}||undef;
my $fast=$opts{f}||undef;
my $dupes=$opts{d}||undef;
$missing=1 unless $common || $dupes;;
&usage() unless $ARGV[1];
&usage() if $opts{h};
my (%found,%k,%fields);
if ($column) {
    die("The -k option only works in fast (-f) mode\n") unless $fast;
    $column--; ## So I don't need to count from 0
}

open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
while(<$F1>){
    chomp;
    if ($fast){ 
    my @aa=split(/\s+/,$_);
    $k{$aa[0]}++;   
        $found{$aa[0]}++;
    }
    else {
    $k{$_}++;   
        $found{$_}++;
    }
}
close($F1);
my $n=0;
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
my $size=0;
if($verbose){
    while(<F2>){
        $size++;
    }
}
close(F2);
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");

while(<F2>){
    next if /^\s+$/;
    $n++;
    chomp;
    print STDERR "." if $verbose && $n % 10==0;
    print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
    if($fast){
        my @aa=split(/\s+/,$_);
        $k{$aa[0]}++ if defined($k{$aa[0]});
        $fields{$aa[0]}=\@aa if $column;
    }
    else{
        my @keys=keys(%k);
        foreach my $key(keys(%found)){
            if (/\Q$key/){
            $k{$key}++ ;
            $found{$key}=undef unless $dupes;
            }
        }
    }
}
close(F2);
print STDERR "[$n of $size lines]\n" if $verbose;

if ($column) {
    $missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
    $common &&  do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
    $dupes &&   do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
}
else {
    $missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
    $common &&  do map{print "$_\n" if $k{$_}>1}keys(%k);
    $dupes &&   do map{print "$_\n" if $k{$_}>2}keys(%k);
}
sub usage{
    print STDERR <<EndOfHelp;

  USAGE: compare_lists.pl FILE1 FILE2

      This script will compare FILE1 and FILE2, searching for the 
      contents of FILE1 in FILE2 (and NOT vice versa). FILE one must 
      be one search pattern per line, the search pattern need only be 
      contained within one of the lines of FILE2.

    OPTIONS: 
      -c : Print patterns COMMON to both files
      -f : Search only the first characters of each line of FILE2
      for the search pattern given in FILE1
      -d : Print duplicate entries     
      -m : Print patterns MISSING in FILE2 (default)
      -h : Print this help and exit
EndOfHelp
      exit(0);
}

Gives:

给出：

list_compare.pl -cf file1.txt file2.txt

Answer 3

回答by terdon

I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:

我写了一个小的 Perl 脚本，用于这种事情。它可以做的比你要求的更多，但它也可以做你需要的：

##代码##

In your case, you would run it as

在你的情况下，你会运行它

##代码##

The -foption makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the -f.

该-f选项使其仅比较 file2 的第一个单词（由空格定义）并大大加快了速度。要比较整行，请删除-f.

bash 如何在不排序的情况下删除两个文件之间的公共行？

提问by harrison4

回答by Kent

回答by perreal

回答by terdon

相关推荐

最近更新

标签

bash 如何在不排序的情况下删除两个文件之间的公共行？

提问by harrison4

回答by Kent

回答by perreal

回答by terdon

相关推荐

bash 在bash中的文件末尾添加新行字符

bash /bin/sh^M: 错误的解释器：没有那个文件或目录

bash 如何在zsh中一次遍历一个单词

如何在 Bash 中获取 dirname 的最后一部分

相关推荐

最近更新

标签