Java 检测字符串是否包含多个单词的更好方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18885043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 11:54:25  来源:igfitidea点击:

Better way to detect if a string contains multiple words

javastringsubstringcontains

提问by Silver

Hello mates! I am trying to create a program that detects if multiple words are in a string as fast as possible, and if so, executes a behavior. Preferably, I would like it to detect the order of these words too but only if this can be done fast. So far, this is what I have done:

朋友们好!我正在尝试创建一个程序,以尽可能快地检测字符串中是否有多个单词,如果是,则执行一个行为。最好,我也希望它检测这些单词的顺序,但前提是可以快速完成。到目前为止,这就是我所做的:

if (input.contains("adsf") && input.contains("qwer")) {
    execute();          
}

As you can see, doing this for multiple words would become tiresome. Is this the only way or is there a better way of detecting multiple substrings? And is there any way of detecting order?

如您所见,对多个单词执行此操作会变得很烦人。这是唯一的方法还是有更好的方法来检测多个子字符串?有没有办法检测订单?

采纳答案by Hyman

You could use an array:

你可以使用一个数组:

String[] matches = new String[] {"adsf", "qwer"};

bool found = false;
for (String s : matches)
{
  if (input.contains(s))
  {
    execute();
    break;
  }
}

This is efficient as the one posted by you but more maintainable. Looking for a more efficient solution sounds like a micro optimization that should be ignored until proven to be effectively a bottleneck of your code, in any case with a huge string set the solution could be a trie.

这与您发布的一样有效,但更易于维护。寻找更有效的解决方案听起来像是一种微优化,在被证明是代码的有效瓶颈之前,应该忽略它,在任何情况下,如果有一个巨大的字符串集,该解决方案可能是一个尝试。

回答by Christoph Walesch

I'd create a regular expression from the words:

我将从以下单词创建一个正则表达式:

Pattern pattern = Pattern.compile("(?=.*adsf)(?=.*qwer)");
if (pattern.matcher(input).find()) {
    execute();
}

For more details, see this answer: https://stackoverflow.com/a/470602/660143

有关更多详细信息,请参阅此答案:https: //stackoverflow.com/a/470602/660143

回答by NRitH

If you have a lot of substrings to look up, then a regular expression probably isn't going to be much help, so you're better off putting the substrings in a list, then iterating over them and calling input.indexOf(substring)on each one. This returns an intindex of where the substring was found. If you throw each result (except -1, which means that the substring wasn't found) into a TreeMap(where indexis the key and the substring is the value), then you can retrieve them in order by calling keys()on the map.

如果您要查找很多子字符串,那么正则表达式可能不会有太大帮助,因此最好将子字符串放在列表中,然后遍历它们并调用input.indexOf(substring)每个子字符串。这将返回int找到子字符串的位置的索引。如果将每个结果(-1 除外,表示未找到子字符串)都扔到 a TreeMap(其中index是键,子字符串是值)中,那么您可以通过调用keys()映射来按顺序检索它们。

Map<Integer, String> substringIndices = new TreeMap<Integer, String>();
List<String> substrings = new ArrayList<String>();
substrings.add("asdf");
// etc.

for (String substring : substrings) {
  int index = input.indexOf(substring);

  if (index != -1) {
    substringIndices.put(index, substring);
  }
}

for (Integer index : substringIndices.keys()) {
  System.out.println(substringIndices.get(index));
}

回答by SOFe

Use a tree structure to hold the substrings per codepoint. This eliminates the need to

使用树结构来保存每个代码点的子字符串。这消除了需要

Note that this is efficient only if the needle set is almost constant. It is not inefficient if there are individual additions or removals of substrings though, but a different initialization each time to arrange a lot of strings into a tree structure would definitely slower it.

请注意,只有当针组几乎恒定时,这才是有效的。如果单独添加或删除子字符串,这并不是低效的,但是每次将大量字符串排列成树结构时进行不同的初始化肯定会减慢它的速度。

StringSearcher:

StringSearcher

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.HashMap;

class StringSearcher{
    private NeedleTree needles = new NeedleTree(-1);
    private boolean caseSensitive;
    private List<Integer> lengths = new ArrayList<>();
    private int maxLength;

    public StringSearcher(List<String> inputs, boolean caseSensitive){
        this.caseSensitive = caseSensitive;
        for(String input : inputs){
            if(!lengths.contains(input.length())){
                lengths.add(input.length());
            }
            NeedleTree tree = needles;
            for(int i = 0; i < input.length(); i++){
                tree = tree.child(caseSensitive ? input.codePointat(i) : Character.toLowerCase(input.codePointAt(i)));
            }
            tree.markSelfSet();
        }
        maxLength = Collections.max(legnths);
    }

    public boolean matches(String haystack){
        if(!caseSensitive){
            haystack = haystack.toLowerCase();
        }
        for(int i = 0; i < haystack.length(); i++){
            String substring = haystack.substring(i, i + maxLength); // maybe we can even skip this and use from haystack directly?
            NeedleTree tree = needles;
            for(int j = 0; j < substring.maxLength; j++){
                tree = tree.childOrNull(substring.codePointAt(j));
                if(tree == null){
                    break;
                }
                if(tree.isSelfSet()){
                    return true;
                }
            }
        }
        return false;
    }
}

NeedleTree.java:

NeedleTree.java

import java.util.HashMap;
import java.util.Map;

class NeedleTree{
    private int codePoint;
    private boolean selfSet;
    private Map<Integer, NeedleTree> children = new HashMap<>();

    public NeedleTree(int codePoint){
        this.codePoint = codePoint;
    }

    public NeedleTree childOrNull(int codePoint){
        return children.get(codePoint);
    }

    public NeedleTree child(int codePoint){
        NeedleTree child = children.get(codePoint);
        if(child == null){
            child = children.put(codePoint, new NeedleTree(codePoint));
        }
        return child;
    }

    public boolean isSelfSet(){
        return selfSet;
    }

    public void markSelfSet(){
        selfSet = true;
    }
}

回答by Linus

In Java 8 you could do,

在 Java 8 中,你可以这样做,

String[] searchFor= {"asdf", "qwer"};
String input = "asdf qwer";
public static boolean containsItemFromArray(String inputString, String[] items) {
    return Arrays.stream(input).allMatch(searchFor::contains);
}

回答by Thomas Fischer

This is a classical interview and CS problem.

这是一个经典的面试和 CS 问题。

Robin Karp algorithm is usually what people first talk about in interviews. The basic idea is that as you go through the string, you add the current character to the hash. If the hash matches the hash of one of your match strings, you know that you might have a match. This avoids having to scan back and forth into your match strings. https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

Robin Karp 算法通常是人们在采访中首先谈论的。基本思想是,当您遍历字符串时,将当前字符添加到散列中。如果散列与您的匹配字符串之一的散列匹配,则您知道您可能有匹配项。这避免了必须来回扫描匹配字符串。 https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

Other typical topics for that interview question are to consider a trie structure to speed up the lookup. If you have a large set of match strings, you have to always check a large set of match strings. A trie structure is more efficient to do that check. https://en.wikipedia.org/wiki/Trie

该面试问题的其他典型主题是考虑使用特里结构来加快查找速度。如果您有大量匹配字符串,则必须始终检查大量匹配字符串。特里结构更有效地进行检查。 https://en.wikipedia.org/wiki/Trie

Additional algorithms are: - Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm

其他算法是: - Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm

回答by Virendra khade

I think a better approach would be something like this, where we can add multiple values as a one string and by index of function validate index

我认为更好的方法是这样的,我们可以将多个值添加为一个字符串,并通过函数验证索引的索引

String s = "123"; 
System.out.println(s.indexOf("1")); // 0
System.out.println(s.indexOf("2")); // 1 
System.out.println(s.indexOf("5")); // -1