Java 在字符串中搜索单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3879160/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 06:15:44  来源:igfitidea点击:

Search for a word in a String

javaregexstring

提问by topgun_ivard

If I am looking for a particular word inside a string, for example, in the string "how are you" I am looking for "are". Would a regular indexOf() work faster and better or a Regex match()

如果我要在字符串中查找特定单词,例如,在字符串“你好吗”中,我要查找“are”。常规 indexOf() 会更快更好地工作还是 Regex match()

String testStr = "how are you";
String lookUp = "are";

//METHOD1
if (testStr.indexOf(lookUp) != -1)
{
 System.out.println("Found!");
}

//OR
//METHOD 2
if (testStr.match(".*"+lookUp+".*"))
{
 System.out.println("Found!");
}

Which of the two methods above is a better way of looking for a string inside another string? Or is there a much better alternative?

上述两种方法中哪一种是在另一个字符串中查找字符串的更好方法?或者有更好的选择吗?

  • Ivard
  • 伊瓦尔

采纳答案by Tim Pietzcker

If you don't care whether it's actually the entire word you're matching, then indexOf()will be a lot faster.

如果您不在乎它是否真的是您要匹配的整个单词,那么indexOf()速度会快很多。

If, on the other hand, you need to be able to differentiate between are, harebrained, aren'tetc., then you need a regex: \bare\bwill only match areas an entire word (\\bare\\bin Java).

如果,另一方面,你需要能够区分areharebrainedaren't等等,那么你需要一个正则表达式:\bare\b将只匹配are作为一个完整的单词(\\bare\\b在Java中)。

\bis a word boundary anchor, and it matches the empty space between an alphanumeric character (letter, digit, or underscore) and a non-alphanumeric character.

\b是词边界锚,它匹配字母数字字符(字母、数字或下划线)和非字母数字字符之间的空格。

Caveat: This also means that if your search term isn't actually a word (let's say you're looking for ###), then these word boundary anchors will only match in a string like aaa###zzz, but not in +++###+++.

警告:这也意味着,如果您的搜索词实际上不是一个词(假设您正在寻找###),那么这些词边界锚点只会在像 这样的字符串中匹配aaa###zzz,而不会在 中匹配+++###+++

Further caveat: Java has by default a limited worldview on what constitutes an alphanumeric character. Only ASCII letters/digits (plus the underscore) count here, so word boundary anchors will fail on words like élève, relevéor ?rgern. Read more about this (and how to solve this problem) here.

进一步警告:默认情况下,Java 对字母数字字符的构成具有有限的世界观。此处仅 ASCII 字母/数字(加上下划线)计数,因此单词边界锚点将在élève,relevé或等单词上失败?rgern在这里阅读更多关于这个(以及如何解决这个问题)的信息

回答by stacker

Method one should be faster because it has lesser overhead. if it is about performance in searching in huge files a specialized method like boyer moore pattern matchingcould lead to further improvements.

方法一应该更快,因为它的开销较小。如果是关于在大文件中搜索的性能,像boyer moore 模式匹配这样的专门方法可能会导致进一步的改进。

回答by codaddict

If you are looking up one stringinside another you should be using indexOfor containsmethod. Example: See if "foo"is present in a string.

如果您在另一个字符串中查找一个字符串,您应该使用indexOforcontains方法。示例:查看是否"foo"存在于字符串中。

But if you are looking for a patternuse the matchmethod.
Example: See if "foo"is present at the beginning/endof the string. Or see if it's present as a wholeword.

但是,如果您正在寻找一种模式,请使用该match方法。
示例:查看是否"foo"出现在字符串的开头/结尾。或者看看它是否作为一个完整的词出现。

Using the matchmethod for simple string searching is not efficient because of the regex engine overhead.

match由于正则表达式引擎的开销,使用该方法进行简单的字符串搜索效率不高。

回答by Emil

The first method is faster and since it's not a complex expressions there is no reason to use regex here.

第一种方法更快,并且由于它不是复杂的表达式,因此没有理由在此处使用正则表达式。

回答by Grodriguez

If you are looking for a fixed string, not a pattern, as in the example in your question, indexOfwill be better (simpler) and faster, since it does not need to use regular expressions.

如果您正在寻找固定字符串,而不是模式,如您问题中的示例所示,indexOf会更好(更简单)和更快,因为它不需要使用正则表达式。

Also, if the string you are searching for does contain characters that have a special meaning in regular expressions, with indexOfyou don't need to worry about escaping these characters.

此外,如果您搜索的字符串确实包含在正则表达式中具有特殊含义的字符,则indexOf您无需担心转义这些字符。

In general, use indexOfwhere possible, and matchfor pattern matching, where indexOfcannot do what you need.

一般来说,indexOf在可能的情况下使用,match对于模式匹配,哪里indexOf不能做你需要的。

回答by shenju

of course indexOf()is better than match(). one 'match()' consists of many compares: a==a,r==r ,e==e ; at the same time,you append wildcards which would be divided into many cases:

当然indexOf()match(). 一个 'match()' 由许多比较组成: a==a,r==r ,e==e ; 同时,您附加通配符,这些通配符将分为多种情况:

  1. ?are
    ??are
    ???are
    ????are
    ........ are are? are?? are???
  1. ?are
    ??are
    ???are
    ????are
    ........ are? 是??是???

until it's as long as the original strings.

直到它与原始字符串一样长。

回答by Alan Moore

Your question practically answers itself; if you have to askwhether regex is the better choice, it almost certainly isn't. Also, when you're choosing between regex and non-regex solutions, performance should never be your primary criterion. Wait until you've got some working code and profile it.

您的问题实际上可以回答;如果你要正则表达式是否是更好的选择,它几乎肯定不是。此外,当您在正则表达式和非正则表达式解决方案之间进行选择时,性能永远不应成为您的主要标准。等到你有一些工作代码并分析它。

回答by A_Var

A better approach to compare the both versions is to analyze the source code of indexOf method and the regex.matches methods itself, calculating runtime of both the algorithm implementations in Big_O_notation and comparing their best, average and worst cases (charsequence found at start, middle or end of the string respectively). The source code goes here indexOf_sourceand here regex.matches. We need to do a run-time analysis of both to see what it is exactly doing. Hectic task but it's the only way to make a true comparison, the rest of them being only assumptions. Good question though.

比较两个版本的更好方法是分析 indexOf 方法的源代码和 regex.matches 方法本身,在 Big_O_notation 中计算两个算法实现的运行时间并比较它们的最佳、平均和最坏情况(在开始、中间找到的字符序列)或字符串的结尾)。源代码在这里indexOf_source和这里regex.matches。我们需要对两者进行运行时分析,以了解它究竟在做什么。忙碌的任务,但这是进行真正比较的唯一方法,其余的只是假设。好问题。

回答by barwnikk

I use it:

我用这个:

public boolean searchStr(String search, String what) {
    if(!search.replaceAll(what,"_").equals(search)) {
        return true;
    }
    return false;
}

Example use:

使用示例:

String s = "abc";
String w = "bc";
if(searchStr(s,w)) { 
    //this returns true
}
s="qwe";
w="asd";
if(searchStr(s,w)) { 
    //this returns false
}