java 是否可以将 replaceAll() 与通配符一起使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12376939/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
is it possible to use replaceAll() with wildcards
提问by Deslyxia
Good morning. I realize there are a ton of questions out there regarding replace and replaceAll()
but i havnt seen this.
早上好。我意识到有很多关于替换的问题,replaceAll()
但我没有看到这一点。
What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of <p>
in the string i want to remove everything that starts with & and ends with ; until i see the next </p>
我想要做的是解析一个字符串(其中包含有效的 html 到一点),然后在我看到<p>
字符串中的第二个实例后,我想删除所有以 & 开头并以 ; 结尾的内容。直到我看到下一个</p>
To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")
做第二部分,我希望使用类似的东西 s.replaceAll("&*;","")
That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;
那行不通,但希望它能让我明白我希望替换任何以 & 开头并以 ; 结尾的内容。
回答by Brian
You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the <p>
tags.
您可能应该将解析留给 DOM 解析器(请参阅此问题)。我几乎可以保证您必须这样做才能在<p>
标签中查找文本。
For the replacement logic, String.replaceAll
uses regular expressions, which can do the matching you want.
对于替换逻辑,String.replaceAll
使用正则表达式,可以做你想要的匹配。
The "wildcard" in regular expressions that you want is the .*
expression. Using your example:
您想要的正则表达式中的“通配符”就是.*
表达式。使用您的示例:
String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);
This outputs This String
. This is because the .
represents any character, and the *
means "this character 0 or more times." So .*
basically means "any number of characters." However, feeding it:
这输出This String
. 这是因为.
代表任何字符,并且*
意味着“这个字符 0 次或多次”。所以.*
基本上意味着“任意数量的字符”。然而,喂它:
"This &escape;String &anotherescape;Extended"
will probably not do what you want, and it will output This Extended
. To fix this, you specify exactly what you want to look for instead of the .
character. This is done using [^;]
, which means "any character that's nota semicolon:
可能不会做你想做的事,它会输出This Extended
. 要解决此问题,您需要准确指定要查找的内容而不是.
字符。这是使用 完成的[^;]
,这意味着“任何不是分号的字符:
String removed = ampStr.replaceAll("&[^;]*;", "");
This has performance benefits over &.*?;
for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc;
token and the &.*?;
version can have huge performance bottle-necks as a result.
这&.*?;
对不匹配的字符串具有性能优势,因此我强烈建议使用此版本,特别是因为并非所有 HTML 文件都包含&abc;
令牌,因此该&.*?;
版本可能会存在巨大的性能瓶颈。
回答by Jon Lin
The expression you want is:
你想要的表达是:
s.replaceAll("&.*?;","");
But do you really want to be parsing HTML this way? You may be better off using an XML parser.
但是您真的想以这种方式解析 HTML 吗?最好使用 XML 解析器。