java 从java中的另一个字符串中删除字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4769282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing strings from another string in java
提问by Mat B.
Lets say I have this list of words:
假设我有这个单词列表:
String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};
Than I have text
比我有文字
String text = "I would like to do a nice novel about nature AND people"
Is there method that matches the stopWords and removes them while ignoring case; like this somewhere out there?:
是否有匹配停用词并在忽略大小写的情况下删除它们的方法;像这样的地方吗?:
String noStopWordsText = remove(text, stopWords);
Result:
结果:
" would like do nice novel nature people"
If you know about regex that wold work great but I would really prefer something like commons solution that is bit more performance oriented.
如果你知道正则表达式会很好用,但我真的更喜欢像 commons 解决方案这样的更注重性能的东西。
BTW, right now I'm using this commons method which is lacking proper insensitive case handling:
顺便说一句,现在我正在使用这种缺乏适当的不敏感大小写处理的公共方法:
private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};
noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);
采纳答案by Theo
This is a solution that does not use regular expressions. I think it's inferior to my other answer because it is much longer and less clear, but if performance is really, really important then this is O(n)where nis the length of the text.
这是一个不使用正则表达式的解决方案。我认为它不如我的其他答案,因为它更长而且不太清楚,但是如果性能真的非常重要,那么这是O(n),其中n是文本的长度。
Set<String> stopWords = new HashSet<String>();
stopWords.add("a");
stopWords.add("and");
// and so on ...
String sampleText = "I would like to do a nice novel about nature AND people";
StringBuffer clean = new StringBuffer();
int index = 0;
while (index < sampleText.length) {
// the only word delimiter supported is space, if you want other
// delimiters you have to do a series of indexOf calls and see which
// one gives the smallest index, or use regex
int nextIndex = sampleText.indexOf(" ", index);
if (nextIndex == -1) {
nextIndex = sampleText.length - 1;
}
String word = sampleText.substring(index, nextIndex);
if (!stopWords.contains(word.toLowerCase())) {
clean.append(word);
if (nextIndex < sampleText.length) {
// this adds the word delimiter, e.g. the following space
clean.append(sampleText.substring(nextIndex, nextIndex + 1));
}
}
index = nextIndex + 1;
}
System.out.println("Stop words removed: " + clean.toString());
回答by Theo
Create a regular expression with your stop words, make it case insensitive, and then use the matcher's replaceAll
method to replace all matches with an empty string
使用您的停用词创建一个正则表达式,使其不区分大小写,然后使用匹配器的replaceAll
方法将所有匹配项替换为空字符串
import java.util.regex.*;
Pattern stopWords = Pattern.compile("\b(?:i|a|and|about|an|are|...)\b\s*", Pattern.CASE_INSENSITIVE);
Matcher matcher = stopWords.matcher("I would like to do a nice novel about nature AND people");
String clean = matcher.replaceAll("");
the ...
in the pattern is just me being lazy, continue the list of stop words.
将...
在模式只是我懒惰,继续停止词列表。
Another method is to loop over all the stop words and use String
's replaceAll
method. The problem with that approach is that replaceAll
will compile a new regular expression for each call, so it's not very efficient to use in loops. Also, you can't pass the flag that makes the regular expression case insensitive when you use String
's replaceAll
.
另一种方法是遍历所有停用词并使用String
的replaceAll
方法。这种方法的问题在于,它replaceAll
会为每次调用编译一个新的正则表达式,因此在循环中使用效率不高。此外,当您使用String
's时,您不能传递使正则表达式不区分大小写的标志replaceAll
。
Edit: I added \b
around the pattern to make it match whole words only. I also added \s*
to make it glob up any spaces after, that's maybe not necessary.
编辑:我\b
在模式周围添加了它以使其仅匹配整个单词。我还添加\s*
了它之后的任何空格,这可能没有必要。
回答by Jigar Joshi
You can make a reg expression to match all the stop words[for example a
, note space here]and end up with
您可以创建一个 reg 表达式来匹配所有停用词[例如a
,请注意此处的空格] 并以
str.replaceAll(regexpression,"");
OR
或者
String[] stopWords = new String[]{" i ", " a ", " and ", " about ", " an ", " are ", " as ", " at ", " be ", " by ", " com ", " for ", " from ", " how ", " in ", " is ", " it ", " not ", " of ", " on ", " or ", " that ", " the ", " this ", " to ", " was ", " what ", " when ", " where ", " who ", " will ", " with ", " the ", " www "};
String text = " I would like to do a nice novel about nature AND people ";
for (String stopword : stopWords) {
text = text.replaceAll("(?i)"+stopword, " ");
}
System.out.println(text);
output:
输出:
would like do nice novel nature people
There might be better way.
可能有更好的方法。
回答by fastcodejava
Split text
on whilespace. Then loop through the array and keep appending to a StringBuilder
only if it is not one of the stop words.
斯普利特text
在whilespace。然后循环遍历数组并StringBuilder
仅在它不是停用词之一时才继续附加到 a 。