Java 检测字符串是否包含多个单词的更好方法

Question

提问by Silver

Hello mates! I am trying to create a program that detects if multiple words are in a string as fast as possible, and if so, executes a behavior. Preferably, I would like it to detect the order of these words too but only if this can be done fast. So far, this is what I have done:

朋友们好！我正在尝试创建一个程序，以尽可能快地检测字符串中是否有多个单词，如果是，则执行一个行为。最好，我也希望它检测这些单词的顺序，但前提是可以快速完成。到目前为止，这就是我所做的：

if (input.contains("adsf") && input.contains("qwer")) {
    execute();          
}

As you can see, doing this for multiple words would become tiresome. Is this the only way or is there a better way of detecting multiple substrings? And is there any way of detecting order?

如您所见，对多个单词执行此操作会变得很烦人。这是唯一的方法还是有更好的方法来检测多个子字符串？有没有办法检测订单？

Answer 1

采纳答案by Hyman

You could use an array:

你可以使用一个数组：

String[] matches = new String[] {"adsf", "qwer"};

bool found = false;
for (String s : matches)
{
  if (input.contains(s))
  {
    execute();
    break;
  }
}

This is efficient as the one posted by you but more maintainable. Looking for a more efficient solution sounds like a micro optimization that should be ignored until proven to be effectively a bottleneck of your code, in any case with a huge string set the solution could be a trie.

这与您发布的一样有效，但更易于维护。寻找更有效的解决方案听起来像是一种微优化，在被证明是代码的有效瓶颈之前，应该忽略它，在任何情况下，如果有一个巨大的字符串集，该解决方案可能是一个尝试。

Answer 2

回答by Christoph Walesch

I'd create a regular expression from the words:

我将从以下单词创建一个正则表达式：

Pattern pattern = Pattern.compile("(?=.*adsf)(?=.*qwer)");
if (pattern.matcher(input).find()) {
    execute();
}

For more details, see this answer: https://stackoverflow.com/a/470602/660143

有关更多详细信息，请参阅此答案：https: //stackoverflow.com/a/470602/660143

Answer 3

回答by NRitH

If you have a lot of substrings to look up, then a regular expression probably isn't going to be much help, so you're better off putting the substrings in a list, then iterating over them and calling input.indexOf(substring)on each one. This returns an intindex of where the substring was found. If you throw each result (except -1, which means that the substring wasn't found) into a TreeMap(where indexis the key and the substring is the value), then you can retrieve them in order by calling keys()on the map.

如果您要查找很多子字符串，那么正则表达式可能不会有太大帮助，因此最好将子字符串放在列表中，然后遍历它们并调用input.indexOf(substring)每个子字符串。这将返回int找到子字符串的位置的索引。如果将每个结果（-1 除外，表示未找到子字符串）都扔到 a TreeMap（其中index是键，子字符串是值）中，那么您可以通过调用keys()映射来按顺序检索它们。

Map<Integer, String> substringIndices = new TreeMap<Integer, String>();
List<String> substrings = new ArrayList<String>();
substrings.add("asdf");
// etc.

for (String substring : substrings) {
  int index = input.indexOf(substring);

  if (index != -1) {
    substringIndices.put(index, substring);
  }
}

for (Integer index : substringIndices.keys()) {
  System.out.println(substringIndices.get(index));
}

Answer 4

回答by SOFe

Use a tree structure to hold the substrings per codepoint. This eliminates the need to

使用树结构来保存每个代码点的子字符串。这消除了需要

Note that this is efficient only if the needle set is almost constant. It is not inefficient if there are individual additions or removals of substrings though, but a different initialization each time to arrange a lot of strings into a tree structure would definitely slower it.

请注意，只有当针组几乎恒定时，这才是有效的。如果单独添加或删除子字符串，这并不是低效的，但是每次将大量字符串排列成树结构时进行不同的初始化肯定会减慢它的速度。

`StringSearcher`:

`StringSearcher`：

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.HashMap;

class StringSearcher{
    private NeedleTree needles = new NeedleTree(-1);
    private boolean caseSensitive;
    private List<Integer> lengths = new ArrayList<>();
    private int maxLength;

    public StringSearcher(List<String> inputs, boolean caseSensitive){
        this.caseSensitive = caseSensitive;
        for(String input : inputs){
            if(!lengths.contains(input.length())){
                lengths.add(input.length());
            }
            NeedleTree tree = needles;
            for(int i = 0; i < input.length(); i++){
                tree = tree.child(caseSensitive ? input.codePointat(i) : Character.toLowerCase(input.codePointAt(i)));
            }
            tree.markSelfSet();
        }
        maxLength = Collections.max(legnths);
    }

    public boolean matches(String haystack){
        if(!caseSensitive){
            haystack = haystack.toLowerCase();
        }
        for(int i = 0; i < haystack.length(); i++){
            String substring = haystack.substring(i, i + maxLength); // maybe we can even skip this and use from haystack directly?
            NeedleTree tree = needles;
            for(int j = 0; j < substring.maxLength; j++){
                tree = tree.childOrNull(substring.codePointAt(j));
                if(tree == null){
                    break;
                }
                if(tree.isSelfSet()){
                    return true;
                }
            }
        }
        return false;
    }
}

`NeedleTree.java`:

`NeedleTree.java`：

import java.util.HashMap;
import java.util.Map;

class NeedleTree{
    private int codePoint;
    private boolean selfSet;
    private Map<Integer, NeedleTree> children = new HashMap<>();

    public NeedleTree(int codePoint){
        this.codePoint = codePoint;
    }

    public NeedleTree childOrNull(int codePoint){
        return children.get(codePoint);
    }

    public NeedleTree child(int codePoint){
        NeedleTree child = children.get(codePoint);
        if(child == null){
            child = children.put(codePoint, new NeedleTree(codePoint));
        }
        return child;
    }

    public boolean isSelfSet(){
        return selfSet;
    }

    public void markSelfSet(){
        selfSet = true;
    }
}

Answer 5

回答by Linus

In Java 8 you could do,

在 Java 8 中，你可以这样做，

String[] searchFor= {"asdf", "qwer"};
String input = "asdf qwer";
public static boolean containsItemFromArray(String inputString, String[] items) {
    return Arrays.stream(input).allMatch(searchFor::contains);
}

Answer 6

回答by Thomas Fischer

This is a classical interview and CS problem.

这是一个经典的面试和 CS 问题。

Robin Karp algorithm is usually what people first talk about in interviews. The basic idea is that as you go through the string, you add the current character to the hash. If the hash matches the hash of one of your match strings, you know that you might have a match. This avoids having to scan back and forth into your match strings. https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

Robin Karp 算法通常是人们在采访中首先谈论的。基本思想是，当您遍历字符串时，将当前字符添加到散列中。如果散列与您的匹配字符串之一的散列匹配，则您知道您可能有匹配项。这避免了必须来回扫描匹配字符串。 https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

Other typical topics for that interview question are to consider a trie structure to speed up the lookup. If you have a large set of match strings, you have to always check a large set of match strings. A trie structure is more efficient to do that check. https://en.wikipedia.org/wiki/Trie

该面试问题的其他典型主题是考虑使用特里结构来加快查找速度。如果您有大量匹配字符串，则必须始终检查大量匹配字符串。特里结构更有效地进行检查。 https://en.wikipedia.org/wiki/Trie

Additional algorithms are: - Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm

其他算法是： - Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm

Answer 7

回答by Virendra khade

I think a better approach would be something like this, where we can add multiple values as a one string and by index of function validate index

我认为更好的方法是这样的，我们可以将多个值添加为一个字符串，并通过函数验证索引的索引

String s = "123"; 
System.out.println(s.indexOf("1")); // 0
System.out.println(s.indexOf("2")); // 1 
System.out.println(s.indexOf("5")); // -1

Java 检测字符串是否包含多个单词的更好方法

提问by Silver

采纳答案by Hyman

回答by Christoph Walesch

回答by NRitH

回答by SOFe

`StringSearcher`:

`StringSearcher`：

`NeedleTree.java`:

`NeedleTree.java`：

回答by Linus

回答by Thomas Fischer

回答by Virendra khade

相关推荐

最近更新

标签

Java 检测字符串是否包含多个单词的更好方法

提问by Silver

采纳答案by Hyman

回答by Christoph Walesch

回答by NRitH

回答by SOFe

StringSearcher:

StringSearcher：

NeedleTree.java:

NeedleTree.java：

回答by Linus

回答by Thomas Fischer

回答by Virendra khade

相关推荐

Java Selenium - 无法启动 Selenium 会话：无法启动新的浏览器会话：启动浏览器时出错

为什么java中的String.replaceAll()需要在正则表达式中使用4个斜杠“\\\\”来实际替换“\”？

Java 如何在eclipse中打开导航窗格？

在 Java 中获取地理位置的最佳方法

相关推荐

最近更新

标签

`StringSearcher`:

`StringSearcher`：

`NeedleTree.java`:

`NeedleTree.java`：