如何使用 Bash 从一组字符串 B 中过滤出一组字符串 A

Question

提问by Ilyes Gouta

I have a list of strings which I want to remove from a super set of another strings, not in a any specific order and thus constructing a new set. Is that doable in Bash?

我有一个字符串列表，我想从另一个字符串的超集中删除它，而不是按照任何特定的顺序，从而构建一个新的集合。这在 Bash 中可行吗？

Answer 1

回答by Evan Krall

It looks like you're looking for something with better than O(nm) running time, so here's an answer to that. Fgrep or grep -F uses the Aho-Corasick algorithm to make a single FSM out of a list of fixed strings, so checking each word in SET2 takes O(length of word) time. This means the whole running time of this script is O(n+m).

看起来您正在寻找比 O(nm) 运行时间更好的东西，所以这里有一个答案。Fgrep 或 grep -F 使用 Aho-Corasick 算法从固定字符串列表中生成单个 FSM，因此检查 SET2 中的每个单词需要 O（单词长度）时间。这意味着这个脚本的整个运行时间是 O(n+m)。

(obviously the running times are also dependent on the length of the words)

（显然运行时间也取决于单词的长度）

[meatmanek@yggdrasil ~]$ cat subtract.sh 
#!/bin/bash
subtract()
{
  SET1=(  )
  SET2=(  )
  OLDIFS="$IFS"
  IFS=$'\n'
  SET3=( $(grep -Fxv "${SET1[*]}" <<< "${SET2[*]}") )
  IFS="$OLDIFS"
  echo "${SET3[*]}"
  # SET3 = SET2-SET1
}
subtract "$@"
[meatmanek@yggdrasil ~]$ . subtract.sh 

[meatmanek@yggdrasil ~]$ subtract "package-x86 test0 hello world" "computer hello sizeof compiler world package-x86 rocks"
computer sizeof compiler rocks
[meatmanek@yggdrasil ~]$

Answer 2

回答by Adam Bard

I think you'll have to at least characterize the parameters of the subset of strings you want to extract. If it's textfield-like data, though, look into awk.

我认为您至少必须描述要提取的字符串子集的参数。但是，如果它是类似文本字段的数据，请查看 awk。

Answer 3

回答by Andreas Otto

> echo "aa b1 c b2 d" |xargs -d' ' -n 1
aa
b1 
c
b2
d

> echo "aa b1 c b2 d" |xargs -d' ' -n 1| grep "^b"
b1
b2

Answer 4

回答by Chris Johnsen

How about any ugly abuse of the builtin command hash?

对内置命令的任何丑陋滥用怎么样hash？

#!/bin/bash
set -eu

filter_out() {
    local words="" words_to_remove=""
    ( # do this in a subshell to avoid contaminating the main script
        set +e
        hash -r
        hash -p bogus-placeholder $words
        hash -d $words_to_remove > /dev/null 2>&1
        left=''
        for word in $words; do
            hash -t "$word" > /dev/null 2>&1 && left="${left}${left:+ }$word"
        done
        printf '%s\n' "$left"
    )
}

filter_out "package-x86 test0 hello world" "computer hello sizeof compiler world package-x86 rocks test0"
w='foo bar baz quux toto'
d='baz toto quux'
filter_out "$d" "$w"

Answer 5

回答by ghostdog74

#!/bin/bash
SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
awk -v s1="$SET1" -v s2="$SET2" 'BEGIN{
    m=split(s1,set1)
    n=split(s2,set2)
    for(i=1;i<=n;i++){
        for (j=1;j<=m;j++){
            if ( set1[j] == set2[i]){
                 delete set2[i]
            }   
        }
    }
    for(i in set2) if (set2[i]!="") {print set2[i]}
}'

output

输出

# ./shell.sh
compiler
rocks
computer
sizeof

Answer 6

回答by Paused until further notice.

This is, what, O(n) or O(n+m)?

这是，什么，O(n) 还是 O(n+m)？

#!/bin/bash
SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
for i in $SET2
do
    [[ ! $SET1 =~ $i  ]] && SET3="${SET3:+${SET3} }$i"
done
echo "..${SET3}.."

Running it:

运行它：

$ ./script
..computer sizeof compiler rocks..

Answer 7

回答by Idelic

Without using anything bash-specific or external commands:

不使用任何 bash 特定或外部命令：

SET1="package-x86 test0 hello world"
SET2="computer hello sizeof compiler world package-x86 rocks test0"
SET3=

for arg in $SET2; do
  case $SET1 in
    $arg\ * | *\ $arg | *\ $arg\ *) ;;
    *) SET3="$SET3 $arg" ;;
  esac
done

Answer 8

回答by Vinko Vrsalovic

This uses grep to see if a word has to be removed, but that's not pure BASH and it's probably faster than the other option (see below)

这使用 grep 来查看是否必须删除某个单词，但这不是纯 BASH 并且它可能比其他选项更快（见下文）

#!/bin/bash
REMOVE="package-x86 test0 hello world"
WORDBAG="computer hello sizeof compiler world package-x86 rocks test0"
OFS=$IFS
IFS=" "
WORDBAG_ARRAY=($WORDBAG)
IFS=$OFS
RESULT=""

for str2 in ${WORDBAG_ARRAY[@]}
do
        echo $REMOVE | grep $str2 >/dev/null
        if [[ $? == 1 ]] #Not Found
        then
                RESULT="$RESULT $str2"
        fi
done

echo $RESULT

This is a bit verbose, uses BASH arrays, and is O(N*M), but works.

这有点冗长，使用 BASH 数组，并且是 O(N*M)，但有效。

#!/bin/bash
REMOVE="package-x86 test0 hello world"
WORDBAG="computer hello sizeof compiler world package-x86 rocks test0"
OFS=$IFS
IFS=" "
REMOVE_ARRAY=($REMOVE)
WORDBAG_ARRAY=($WORDBAG)
IFS=$OFS
RESULT=""

for str2 in ${WORDBAG_ARRAY[@]}
do
        found=0
        for str1 in ${REMOVE_ARRAY[@]}
        do
                if [[ "$str1" == "$str2" ]]
                then
                        found=1
                fi
        done
        if [[ $found == 0 ]]
        then
                RESULT="$RESULT $str2"
        fi
done

echo $RESULT

如何使用 Bash 从一组字符串 B 中过滤出一组字符串 A

提问by Ilyes Gouta

回答by Evan Krall

回答by Adam Bard

回答by Andreas Otto

回答by Chris Johnsen

回答by ghostdog74

回答by Paused until further notice.

回答by Idelic

回答by Vinko Vrsalovic

相关推荐

最近更新

标签

如何使用 Bash 从一组字符串 B 中过滤出一组字符串 A

提问by Ilyes Gouta

回答by Evan Krall

回答by Adam Bard

回答by Andreas Otto

回答by Chris Johnsen

回答by ghostdog74

回答by Paused until further notice.

回答by Idelic

回答by Vinko Vrsalovic

相关推荐

如何确保我的 bash 脚本尚未运行？

BASH MySQL 查询到逗号分隔文件

bash 手动迭代文件的一行 | 猛击

bash 为什么有时退出 shell 时 Unix 后台进程会死掉？

相关推荐

最近更新

标签