java 如何从字符串中删除特定的特殊字符模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11791784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove a specific special character pattern from a string
提问by Roshanck
I have a string name s,
我有一个字符串名称 s,
String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";
I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,
我想从字符串中删除所有 < NOUN> 和 < /NOUN> 标签。我用它来删除标签,
s.replaceAll("[<NOUN>,</NOUN>]","");
Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the stringwhich gives me following output.
是的,它删除了标签。但它也从字符串中删除了字母 'U' 和 'O' 字符,这给了我以下输出。
Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel
Can anyone please tell me how to do this correctly?
谁能告诉我如何正确地做到这一点?
回答by Hubro
Try:
尝试:
s.replaceAll("<NOUN>|</NOUN>", "");
In RegEx, the syntax [...]
will match every characterinside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|
) to match both "<NOUN>" and "</NOUN>".
在 RegEx 中,语法[...]
将匹配括号内的每个字符,无论它们出现的顺序如何。因此,在您的示例中,“<”、“N”、“O”等的所有出现都被删除。而是使用管道 ( |
) 来匹配“<NOUN>”和“</NOUN>”。
The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:
以下也应该有效(并且可以被认为更 DRY 和优雅),因为它会匹配带有和不带有正斜杠的标签:
s.replaceAll("</?NOUN>", "");
回答by Brian Agnew
String.replaceAll() takes a regular expression as its first argument. The regexp:
String.replaceAll() 将正则表达式作为其第一个参数。正则表达式:
"[<NOUN>,</NOUN>]"
defines within the brackets the set of charactersto be identified and thus removed. Thus you're asking to remove the characters <
,>
,/
,N
,O
,U
and comma.
在括号内定义要识别并因此删除的字符集。因此,您要求删除字符<
, >
, /
, N
, O
,U
和逗号。
Perhaps the simplestmethod to do what you want is to do:
也许做你想做的最简单的方法是:
s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");
which is explicit in what it's removing. More complex regular expressions are obviously possible.
这在它删除的内容中很明确。更复杂的正则表达式显然是可能的。
回答by Timo Hahn
You can use one regular expression for this: "<[/]*NOUN>" so
您可以为此使用一个正则表达式:"<[/]*NOUN>" 所以
s.replaceAll("<[/]*NOUN>","");
should do the trick. The "[/]*" matches zero or more "/" after the "<".
应该做的伎俩。“[/]*”与“<”后的零个或多个“/”匹配。
回答by abdelhadi
Try this :String result = originValue.replaceAll("\\<.*?>", "");
试试这个 :String result = originValue.replaceAll("\\<.*?>", "");