java 是否可以将 replaceAll() 与通配符一起使用

Question

提问by Deslyxia

Good morning. I realize there are a ton of questions out there regarding replace and replaceAll()but i havnt seen this.

早上好。我意识到有很多关于替换的问题，replaceAll()但我没有看到这一点。

What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of in the string i want to remove everything that starts with & and ends with ; until i see the next 

我想要做的是解析一个字符串（其中包含有效的 html 到一点），然后在我看到字符串中的第二个实例后，我想删除所有以 & 开头并以 ; 结尾的内容。直到我看到下一个

To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")

做第二部分，我希望使用类似的东西 s.replaceAll("&*;","")

That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;

那行不通，但希望它能让我明白我希望替换任何以 & 开头并以 ; 结尾的内容。

Answer 1

回答by Brian

You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the tags.

您可能应该将解析留给 DOM 解析器（请参阅此问题）。我几乎可以保证您必须这样做才能在标签中查找文本。

For the replacement logic, String.replaceAlluses regular expressions, which can do the matching you want.

对于替换逻辑，String.replaceAll使用正则表达式，可以做你想要的匹配。

The "wildcard" in regular expressions that you want is the .*expression. Using your example:

您想要的正则表达式中的“通配符”就是.*表达式。使用您的示例：

String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);

This outputs This String. This is because the .represents any character, and the *means "this character 0 or more times." So .*basically means "any number of characters." However, feeding it:

这输出This String. 这是因为.代表任何字符，并且*意味着“这个字符 0 次或多次”。所以.*基本上意味着“任意数量的字符”。然而，喂它：

"This &escape;String &anotherescape;Extended"

will probably not do what you want, and it will output This Extended. To fix this, you specify exactly what you want to look for instead of the .character. This is done using [^;], which means "any character that's nota semicolon:

可能不会做你想做的事，它会输出This Extended. 要解决此问题，您需要准确指定要查找的内容而不是.字符。这是使用完成的[^;]，这意味着“任何不是分号的字符：

String removed = ampStr.replaceAll("&[^;]*;", "");

This has performance benefits over &.*?;for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc;token and the &.*?;version can have huge performance bottle-necks as a result.

这&.*?;对不匹配的字符串具有性能优势，因此我强烈建议使用此版本，特别是因为并非所有 HTML 文件都包含&abc;令牌，因此该&.*?;版本可能会存在巨大的性能瓶颈。

Answer 2

回答by Jon Lin

The expression you want is:

你想要的表达是：

s.replaceAll("&.*?;","");

But do you really want to be parsing HTML this way? You may be better off using an XML parser.

但是您真的想以这种方式解析 HTML 吗？最好使用 XML 解析器。

java 是否可以将 replaceAll() 与通配符一起使用

提问by Deslyxia

回答by Brian

回答by Jon Lin

相关推荐

最近更新

标签

java 是否可以将 replaceAll() 与通配符一起使用

提问by Deslyxia

回答by Brian

回答by Jon Lin

相关推荐

java stanford Core NLP：从文本中拆分句子

java 在 logcat 中显示数组值

java request.getCharacterEncoding() 返回 NULL...为什么？

从 Java 生成 LLVM 代码

相关推荐

最近更新

标签