java 如何从字符串中删除特定的特殊字符模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11791784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 06:23:02  来源:igfitidea点击:

How to remove a specific special character pattern from a string

javastring

提问by Roshanck

I have a string name s,

我有一个字符串名称 s,

String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";  

I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,

我想从字符串中删除所有 < NOUN> 和 < /NOUN> 标签。我用它来删除标签,

s.replaceAll("[<NOUN>,</NOUN>]","");

Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the stringwhich gives me following output.

是的,它删除了标签。但它也从字符串中删除了字母 'U' 和 'O' 字符,这给了我以下输出。

 Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel

Can anyone please tell me how to do this correctly?

谁能告诉我如何正确地做到这一点?

回答by Hubro

Try:

尝试:

s.replaceAll("<NOUN>|</NOUN>", "");

In RegEx, the syntax [...]will match every characterinside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|) to match both "<NOUN>" and "</NOUN>".

在 RegEx 中,语法[...]将匹配括号内的每个字符,无论它们出现的顺序如何。因此,在您的示例中,“<”、“N”、“O”等的所有出现都被删除。而是使用管道 ( |) 来匹配“<NOUN>”和“</NOUN>”。

The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:

以下也应该有效(并且可以被认为更 DRY 和优雅),因为它会匹配带有和不带有正斜杠的标签:

s.replaceAll("</?NOUN>", "");

回答by Brian Agnew

String.replaceAll() takes a regular expression as its first argument. The regexp:

String.replaceAll() 将正则表达式作为其第一个参数。正则表达式:

"[<NOUN>,</NOUN>]"

defines within the brackets the set of charactersto be identified and thus removed. Thus you're asking to remove the characters <,>,/,N,O,Uand comma.

在括号内定义要识别并因此删除的字符集。因此,您要求删除字符<, >, /, N, O,U和逗号。

Perhaps the simplestmethod to do what you want is to do:

也许做你想做的最简单的方法是:

s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");

which is explicit in what it's removing. More complex regular expressions are obviously possible.

这在它删除的内容中很明确。更复杂的正则表达式显然是可能的。

回答by Timo Hahn

You can use one regular expression for this: "<[/]*NOUN>" so

您可以为此使用一个正则表达式:"<[/]*NOUN>" 所以

s.replaceAll("<[/]*NOUN>","");

should do the trick. The "[/]*" matches zero or more "/" after the "<".

应该做的伎俩。“[/]*”与“<”后的零个或多个“/”匹配。

回答by abdelhadi

Try this :String result = originValue.replaceAll("\\<.*?>", "");

试试这个 :String result = originValue.replaceAll("\\<.*?>", "");