Java一次(或以最有效的方式)替换字符串中的多个不同子字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1326682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Replacing multiple different substring in a string at once (or in the most efficient way)
提问by Yossale
I need to replace many different sub-string in a string in the most efficient way. is there another way other then the brute force way of replacing each field using string.replace ?
我需要以最有效的方式替换字符串中的许多不同子字符串。除了使用 string.replace 替换每个字段的蛮力方法之外,还有另一种方法吗?
采纳答案by Todd Owen
If the string you are operating on is very long, or you are operating on many strings, then it could be worthwhile using a java.util.regex.Matcher (this requires time up-front to compile, so it won't be efficient if your input is very small or your search pattern changes frequently).
如果您正在操作的字符串很长,或者您正在操作许多字符串,那么使用 java.util.regex.Matcher 可能是值得的(这需要预先编译时间,因此效率不高如果您的输入非常小或您的搜索模式经常更改)。
Below is a full example, based on a list of tokens taken from a map. (Uses StringUtils from Apache Commons Lang).
下面是一个完整的示例,基于从地图中获取的令牌列表。(使用来自 Apache Commons Lang 的 StringUtils)。
Map<String,String> tokens = new HashMap<String,String>();
tokens.put("cat", "Garfield");
tokens.put("beverage", "coffee");
String template = "%cat% really needs some %beverage%.";
// Create pattern of the format "%(cat|beverage)%"
String patternString = "%(" + StringUtils.join(tokens.keySet(), "|") + ")%";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(template);
StringBuffer sb = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(sb, tokens.get(matcher.group(1)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());
Once the regular expression is compiled, scanning the input string is generally very quick (although if your regular expression is complex or involves backtracking then you would still need to benchmark in order to confirm this!)
一旦正则表达式被编译,扫描输入字符串通常会很快(尽管如果你的正则表达式很复杂或涉及回溯,那么你仍然需要进行基准测试以确认这一点!)
回答by Avi
How about using the replaceAll()method?
使用replaceAll()方法怎么样?
回答by Steve McLeod
If you are going to be changing a String many times, then it is usually more efficient to use a StringBuilder (but measure your performance to find out):
如果您要多次更改字符串,那么使用 StringBuilder 通常更有效(但要测量您的性能以找出答案):
String str = "The rain in Spain falls mainly on the plain";
StringBuilder sb = new StringBuilder(str);
// do your replacing in sb - although you'll find this trickier than simply using String
String newStr = sb.toString();
Every time you do a replace on a String, a new String object is created, because Strings are immutable. StringBuilder is mutable, that is, it can be changed as much as you want.
每次对 String 进行替换时,都会创建一个新的 String 对象,因为 String 是不可变的。StringBuilder 是可变的,也就是说,它可以随心所欲地更改。
回答by Brian Agnew
StringBuilder
will perform replace more efficiently, since its character array buffer can be specified to a required length.StringBuilder
is designed for more than appending!
StringBuilder
将更有效地执行替换,因为它的字符数组缓冲区可以指定为所需的长度。StringBuilder
不仅仅是为了附加而设计的!
Of course the real question is whether this is an optimisation too far ? The JVM is very good at handling creation of multiple objects and the subsequent garbage collection, and like all optimisation questions, my first question is whether you've measured this and determined that it's a problem.
当然,真正的问题是这是否是一种优化过度?JVM 非常擅长处理多个对象的创建和后续的垃圾收集,并且像所有优化问题一样,我的第一个问题是您是否已经对此进行了测量并确定这是一个问题。
回答by Ali
Check this:
检查这个:
String.format(str,STR[])
For instance:
例如:
String.format( "Put your %s where your %s is", "money", "mouth" );
回答by Gelin Luo
Rythm a java template engine now released with an new feature called String interpolation modewhich allows you do something like:
Rythm 是一个 Java 模板引擎,现在发布了一个名为字符串插值模式的新功能,它允许您执行以下操作:
String result = Rythm.render("@name is inviting you", "Diana");
The above case shows you can pass argument to template by position. Rythm also allows you to pass arguments by name:
上面的案例表明您可以按位置将参数传递给模板。Rhythm 还允许您按名称传递参数:
Map<String, Object> args = new HashMap<String, Object>();
args.put("title", "Mr.");
args.put("name", "John");
String result = Rythm.render("Hello @title @name", args);
Note Rythm is VERY FAST, about 2 to 3 times faster than String.format and velocity, because it compiles the template into java byte code, the runtime performance is very close to concatentation with StringBuilder.
注意Rythm 非常快,大约比String.format 和velocity 快2 到3 倍,因为它将模板编译成java 字节码,运行时性能非常接近与StringBuilder 的拼接。
Links:
链接:
- Check the full featured demonstration
- read a brief introduction to Rythm
- download the latest packageor
- fork it
回答by Robin479
public String replace(String input, Map<String, String> pairs) {
// Reverse lexic-order of keys is good enough for most cases,
// as it puts longer words before their prefixes ("tool" before "too").
// However, there are corner cases, which this algorithm doesn't handle
// no matter what order of keys you choose, eg. it fails to match "edit"
// before "bed" in "..bedit.." because "bed" appears first in the input,
// but "edit" may be the desired longer match. Depends which you prefer.
final Map<String, String> sorted =
new TreeMap<String, String>(Collections.reverseOrder());
sorted.putAll(pairs);
final String[] keys = sorted.keySet().toArray(new String[sorted.size()]);
final String[] vals = sorted.values().toArray(new String[sorted.size()]);
final int lo = 0, hi = input.length();
final StringBuilder result = new StringBuilder();
int s = lo;
for (int i = s; i < hi; i++) {
for (int p = 0; p < keys.length; p++) {
if (input.regionMatches(i, keys[p], 0, keys[p].length())) {
/* TODO: check for "edit", if this is "bed" in "..bedit.." case,
* i.e. look ahead for all prioritized/longer keys starting within
* the current match region; iff found, then ignore match ("bed")
* and continue search (find "edit" later), else handle match. */
// if (better-match-overlaps-right-ahead)
// continue;
result.append(input, s, i).append(vals[p]);
i += keys[p].length();
s = i--;
}
}
}
if (s == lo) // no matches? no changes!
return input;
return result.append(input, s, hi).toString();
}
回答by Kip
The below is based on Todd Owen's answer. That solution has the problem that if the replacements contain characters that have special meaning in regular expressions, you can get unexpected results. I also wanted to be able to optionally do a case-insensitive search. Here is what I came up with:
以下内容基于Todd Owen 的回答。该解决方案存在的问题是,如果替换包含在正则表达式中具有特殊含义的字符,您可能会得到意想不到的结果。我还希望能够选择性地进行不区分大小写的搜索。这是我想出的:
/**
* Performs simultaneous search/replace of multiple strings. Case Sensitive!
*/
public String replaceMultiple(String target, Map<String, String> replacements) {
return replaceMultiple(target, replacements, true);
}
/**
* Performs simultaneous search/replace of multiple strings.
*
* @param target string to perform replacements on.
* @param replacements map where key represents value to search for, and value represents replacem
* @param caseSensitive whether or not the search is case-sensitive.
* @return replaced string
*/
public String replaceMultiple(String target, Map<String, String> replacements, boolean caseSensitive) {
if(target == null || "".equals(target) || replacements == null || replacements.size() == 0)
return target;
//if we are doing case-insensitive replacements, we need to make the map case-insensitive--make a new map with all-lower-case keys
if(!caseSensitive) {
Map<String, String> altReplacements = new HashMap<String, String>(replacements.size());
for(String key : replacements.keySet())
altReplacements.put(key.toLowerCase(), replacements.get(key));
replacements = altReplacements;
}
StringBuilder patternString = new StringBuilder();
if(!caseSensitive)
patternString.append("(?i)");
patternString.append('(');
boolean first = true;
for(String key : replacements.keySet()) {
if(first)
first = false;
else
patternString.append('|');
patternString.append(Pattern.quote(key));
}
patternString.append(')');
Pattern pattern = Pattern.compile(patternString.toString());
Matcher matcher = pattern.matcher(target);
StringBuffer res = new StringBuffer();
while(matcher.find()) {
String match = matcher.group(1);
if(!caseSensitive)
match = match.toLowerCase();
matcher.appendReplacement(res, replacements.get(match));
}
matcher.appendTail(res);
return res.toString();
}
Here are my unit test cases:
这是我的单元测试用例:
@Test
public void replaceMultipleTest() {
assertNull(ExtStringUtils.replaceMultiple(null, null));
assertNull(ExtStringUtils.replaceMultiple(null, Collections.<String, String>emptyMap()));
assertEquals("", ExtStringUtils.replaceMultiple("", null));
assertEquals("", ExtStringUtils.replaceMultiple("", Collections.<String, String>emptyMap()));
assertEquals("folks, we are not sane anymore. with me, i promise you, we will burn in flames", ExtStringUtils.replaceMultiple("folks, we are not winning anymore. with me, i promise you, we will win big league", makeMap("win big league", "burn in flames", "winning", "sane")));
assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abccbaabccba", makeMap("a", "b", "b", "c", "c", "a")));
assertEquals("bcaCBAbcCCBb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a")));
assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a"), false));
assertEquals("c colon backslash temp backslash star dot star ", ExtStringUtils.replaceMultiple("c:\temp\*.*", makeMap(".", " dot ", ":", " colon ", "\", " backslash ", "*", " star "), false));
}
private Map<String, String> makeMap(String ... vals) {
Map<String, String> map = new HashMap<String, String>(vals.length / 2);
for(int i = 1; i < vals.length; i+= 2)
map.put(vals[i-1], vals[i]);
return map;
}
回答by Dave Jarvis
Algorithm
算法
One of the most efficient ways to replace matching strings (without regular expressions) is to use the Aho-Corasick algorithmwith a performant Trie(pronounced "try"), fast hashingalgorithm, and efficient collectionsimplementation.
替换匹配字符串(无正则表达式)的最有效方法之一是使用Aho-Corasick 算法和高性能Trie(发音为“try”)、快速散列算法和高效的集合实现。
Simple Code
简单代码
A simple solution leverages Apache's StringUtils.replaceEach
as follows:
一个简单的解决方案利用 ApacheStringUtils.replaceEach
如下:
private String testStringUtils(
final String text, final Map<String, String> definitions ) {
final String[] keys = keys( definitions );
final String[] values = values( definitions );
return StringUtils.replaceEach( text, keys, values );
}
This slows down on large texts.
这会减慢大文本的速度。
Fast Code
快速代码
Bor's implementationof the Aho-Corasick algorithm introduces a bit more complexity that becomes an implementation detail by using a fa?ade with the same method signature:
Bor对 Aho-Corasick 算法的实现引入了更多的复杂性,通过使用具有相同方法签名的外观成为实现细节:
private String testBorAhoCorasick(
final String text, final Map<String, String> definitions ) {
// Create a buffer sufficiently large that re-allocations are minimized.
final StringBuilder sb = new StringBuilder( text.length() << 1 );
final TrieBuilder builder = Trie.builder();
builder.onlyWholeWords();
builder.removeOverlaps();
final String[] keys = keys( definitions );
for( final String key : keys ) {
builder.addKeyword( key );
}
final Trie trie = builder.build();
final Collection<Emit> emits = trie.parseText( text );
int prevIndex = 0;
for( final Emit emit : emits ) {
final int matchIndex = emit.getStart();
sb.append( text.substring( prevIndex, matchIndex ) );
sb.append( definitions.get( emit.getKeyword() ) );
prevIndex = emit.getEnd() + 1;
}
// Add the remainder of the string (contains no more matches).
sb.append( text.substring( prevIndex ) );
return sb.toString();
}
Benchmarks
基准
For the benchmarks, the buffer was created using randomNumericas follows:
对于基准测试,缓冲区是使用randomNumeric创建的,如下所示:
private final static int TEXT_SIZE = 1000;
private final static int MATCHES_DIVISOR = 10;
private final static StringBuilder SOURCE
= new StringBuilder( randomNumeric( TEXT_SIZE ) );
Where MATCHES_DIVISOR
dictates the number of variables to inject:
其中MATCHES_DIVISOR
规定要注入的变量数量:
private void injectVariables( final Map<String, String> definitions ) {
for( int i = (SOURCE.length() / MATCHES_DIVISOR) + 1; i > 0; i-- ) {
final int r = current().nextInt( 1, SOURCE.length() );
SOURCE.insert( r, randomKey( definitions ) );
}
}
The benchmark code itself (JMHseemed overkill):
基准代码本身(JMH似乎有点矫枉过正):
long duration = System.nanoTime();
final String result = testBorAhoCorasick( text, definitions );
duration = System.nanoTime() - duration;
System.out.println( elapsed( duration ) );
1,000,000 : 1,000
1,000,000 : 1,000
A simple micro-benchmark with 1,000,000 characters and 1,000 randomly-placed strings to replace.
一个简单的微基准测试,包含 1,000,000 个字符和 1,000 个要替换的随机放置的字符串。
- testStringUtils:25 seconds, 25533 millis
- testBorAhoCorasick:0 seconds, 68 millis
- testStringUtils:25 秒,25533 毫秒
- testBorAhoCorasick:0 秒,68 毫秒
No contest.
没有比赛。
10,000 : 1,000
10,000 : 1,000
Using 10,000 characters and 1,000 matching strings to replace:
使用 10,000 个字符和 1,000 个匹配字符串来替换:
- testStringUtils:1 seconds, 1402 millis
- testBorAhoCorasick:0 seconds, 37 millis
- testStringUtils:1 秒,1402 毫秒
- testBorAhoCorasick:0 秒,37 毫秒
The divide closes.
鸿沟关闭。
1,000 : 10
1,000 : 10
Using 1,000 characters and 10 matching strings to replace:
使用 1,000 个字符和 10 个匹配的字符串来替换:
- testStringUtils:0 seconds, 7 millis
- testBorAhoCorasick:0 seconds, 19 millis
- testStringUtils:0 秒,7 毫秒
- testBorAhoCorasick:0 秒,19 毫秒
For short strings, the overhead of setting up Aho-Corasick eclipses the brute-force approach by StringUtils.replaceEach
.
对于短字符串,设置 Aho-Corasick 的开销超过了蛮力方法StringUtils.replaceEach
。
A hybrid approach based on text length is possible, to get the best of both implementations.
基于文本长度的混合方法是可能的,以获得两种实现的最佳效果。
Implementations
实现
Consider comparing other implementations for text longer than 1 MB, including:
考虑比较长度超过 1 MB 的文本的其他实现,包括:
- https://github.com/RokLenarcic/AhoCorasick
- https://github.com/hankcs/AhoCorasickDoubleArrayTrie
- https://github.com/raymanrt/aho-corasick
- https://github.com/ssundaresan/Aho-Corasick
- https://github.com/jmhsieh/aho-corasick
- https://github.com/quest-oss/Mensa
- https://github.com/RokLenarcic/AhoCorasick
- https://github.com/hankcs/AhoCorasickDoubleArrayTrie
- https://github.com/raymanrt/aho-corasick
- https://github.com/ssundaresan/Aho-Corasick
- https://github.com/jmhsieh/aho-corasick
- https://github.com/quest-oss/Mensa
Papers
文件
Papers and information relating to the algorithm:
与算法相关的论文和信息:
回答by bikram
This worked for me:
这对我有用:
String result = input.replaceAll("string1|string2|string3","replacementString");
Example:
例子:
String input = "applemangobananaarefruits";
String result = input.replaceAll("mango|are|ts","-");
System.out.println(result);
Output:apple-banana-frui-
输出:apple-banana-fruit-