如何使用 Bash 从一组字符串 B 中过滤出一组字符串 A

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1616097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 21:18:38  来源:igfitidea点击:

How to filter out a set of strings A from a set of strings B using Bash

bashstringfilter

提问by Ilyes Gouta

I have a list of strings which I want to remove from a super set of another strings, not in a any specific order and thus constructing a new set. Is that doable in Bash?

我有一个字符串列表,我想从另一个字符串的超集中删除它,而不是按照任何特定的顺序,从而构建一个新的集合。这在 Bash 中可行吗?

回答by Evan Krall

It looks like you're looking for something with better than O(nm) running time, so here's an answer to that. Fgrep or grep -F uses the Aho-Corasick algorithm to make a single FSM out of a list of fixed strings, so checking each word in SET2 takes O(length of word) time. This means the whole running time of this script is O(n+m).

看起来您正在寻找比 O(nm) 运行时间更好的东西,所以这里有一个答案。Fgrep 或 grep -F 使用 Aho-Corasick 算法从固定字符串列表中生成单个 FSM,因此检查 SET2 中的每个单词需要 O(单词长度)时间。这意味着这个脚本的整个运行时间是 O(n+m)。

(obviously the running times are also dependent on the length of the words)

(显然运行时间也取决于单词的长度)

[meatmanek@yggdrasil ~]$ cat subtract.sh 
#!/bin/bash
subtract()
{
  SET1=(  )
  SET2=(  )
  OLDIFS="$IFS"
  IFS=$'\n'
  SET3=( $(grep -Fxv "${SET1[*]}" <<< "${SET2[*]}") )
  IFS="$OLDIFS"
  echo "${SET3[*]}"
  # SET3 = SET2-SET1
}
subtract "$@"
[meatmanek@yggdrasil ~]$ . subtract.sh 

[meatmanek@yggdrasil ~]$ subtract "package-x86 test0 hello world" "computer hello sizeof compiler world package-x86 rocks"
computer sizeof compiler rocks
[meatmanek@yggdrasil ~]$ 

回答by Adam Bard

I think you'll have to at least characterize the parameters of the subset of strings you want to extract. If it's textfield-like data, though, look into awk.

我认为您至少必须描述要提取的字符串子集的参数。但是,如果它是类似文本字段的数据,请查看 awk。

回答by Andreas Otto

> echo "aa b1 c b2 d" |xargs -d' ' -n 1
aa
b1 
c
b2
d

> echo "aa b1 c b2 d" |xargs -d' ' -n 1| grep "^b"
b1
b2

回答by Chris Johnsen

How about any ugly abuse of the builtin command hash?

对内置命令的任何丑陋滥用怎么样hash

#!/bin/bash
set -eu

filter_out() {
    local words="" words_to_remove=""
    ( # do this in a subshell to avoid contaminating the main script
        set +e
        hash -r
        hash -p bogus-placeholder $words
        hash -d $words_to_remove > /dev/null 2>&1
        left=''
        for word in $words; do
            hash -t "$word" > /dev/null 2>&1 && left="${left}${left:+ }$word"
        done
        printf '%s\n' "$left"
    )
}

filter_out "package-x86 test0 hello world" "computer hello sizeof compiler world package-x86 rocks test0"
w='foo bar baz quux toto'
d='baz toto quux'
filter_out "$d" "$w"

回答by ghostdog74

#!/bin/bash
SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
awk -v s1="$SET1" -v s2="$SET2" 'BEGIN{
    m=split(s1,set1)
    n=split(s2,set2)
    for(i=1;i<=n;i++){
        for (j=1;j<=m;j++){
            if ( set1[j] == set2[i]){
                 delete set2[i]
            }   
        }
    }
    for(i in set2) if (set2[i]!="") {print set2[i]}
}' 

output

输出

# ./shell.sh
compiler
rocks
computer
sizeof

回答by Paused until further notice.

This is, what, O(n) or O(n+m)?

这是,什么,O(n) 还是 O(n+m)?

#!/bin/bash
SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
for i in $SET2
do
    [[ ! $SET1 =~ $i  ]] && SET3="${SET3:+${SET3} }$i"
done
echo "..${SET3}.."

Running it:

运行它:

$ ./script
..computer sizeof compiler rocks..

回答by Idelic

Without using anything bash-specific or external commands:

不使用任何 bash 特定或外部命令:

SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
SET3=

for arg in $SET2; do
  case $SET1 in
    $arg\ * | *\ $arg | *\ $arg\ *) ;;
    *) SET3="$SET3 $arg" ;;
  esac
done

回答by Vinko Vrsalovic

This uses grep to see if a word has to be removed, but that's not pure BASH and it's probably faster than the other option (see below)

这使用 grep 来查看是否必须删除某个单词,但这不是纯 BASH 并且它可能比其他选项更快(见下文)

#!/bin/bash
REMOVE="package-x86 test0 hello world"
WORDBAG="computer hello sizeof compiler world package-x86 rocks test0"
OFS=$IFS
IFS=" "
WORDBAG_ARRAY=($WORDBAG)
IFS=$OFS
RESULT=""

for str2 in ${WORDBAG_ARRAY[@]}
do
        echo $REMOVE | grep $str2 >/dev/null
        if [[ $? == 1 ]] #Not Found
        then
                RESULT="$RESULT $str2"
        fi
done

echo $RESULT

This is a bit verbose, uses BASH arrays, and is O(N*M), but works.

这有点冗长,使用 BASH 数组,并且是 O(N*M),但有效。

#!/bin/bash
REMOVE="package-x86 test0 hello world"
WORDBAG="computer hello sizeof compiler world package-x86 rocks test0"
OFS=$IFS
IFS=" "
REMOVE_ARRAY=($REMOVE)
WORDBAG_ARRAY=($WORDBAG)
IFS=$OFS
RESULT=""

for str2 in ${WORDBAG_ARRAY[@]}
do
        found=0
        for str1 in ${REMOVE_ARRAY[@]}
        do
                if [[ "$str1" == "$str2" ]]
                then
                        found=1
                fi
        done
        if [[ $found == 0 ]]
        then
                RESULT="$RESULT $str2"
        fi
done

echo $RESULT