java使用正则表达式从字符串中删除模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31774415/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 11:35:43  来源:igfitidea点击:

java remove a pattern from string using regex

javaregex

提问by D.Shefer

I need to clear my string from the following substrings:

我需要从以下子字符串中清除我的字符串:

\n

\n

\uXXXX(Xbeing a digit or a character)

\uXXXXX作为数字或字符)

e.g. "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to"

例如 "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to"

-> "OR The Central Site Engineering frontend , where developers turn to"
I tried using the String method replaceAll but dnt know how to overcome the \uXXXX issue as well as it didnt work for the \n

->"OR The Central Site Engineering frontend , where developers turn to"
我尝试使用 String 方法 replaceAll 但不知道如何克服 \uXXXX 问题以及它对 \n 不起作用

String s = "\n";  
data=data.replaceAll(s," ");

how does this regex looks in java?

这个正则表达式在java中看起来如何?

thanks for the help

谢谢您的帮助

回答by Roel Strolenberg

Best to do this in 2 parts I guess:

我想最好分两部分来做:

String ex = "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to";
String part1 = ex.replaceAll("\\n"," "); // The firs \ replaces the backslah, \n replaces the n.
String part2 = part1.replaceAll("u\d\d\d\d","");
System.out.println(part2);

Try it =)

试试吧 =)

回答by Pshemo

Problem with string.replaceAll("\\n", " ");is that replaceAllexpects regular expression, and \in regex is special character used for instance to create character classes like \dwhich represents digits, or to escape regex special characters like +.

问题string.replaceAll("\\n", " ");在于replaceAll需要正则表达式,而\在正则表达式中是特殊字符,用于创建字符类,例如\d代表数字,或转义正则表达式特殊字符,例如+.

So if you want to match \in Javas regex you need to escape it twice:

因此,如果您想\在 Javas regex 中匹配,则需要将其转义两次:

  • once in regex \\
  • and once in String "\\\\".
  • 一次在正则表达式中 \\
  • 并且一次在 String 中"\\\\"

like replaceAll("\\\\n"," ").

喜欢replaceAll("\\\\n"," ")

You can also let regex engine do escaping for you and use replacemethod like

您还可以让正则表达式引擎为您进行转义并使用replace类似的方法

replace("\\n"," ")

replace("\\n"," ")

Now to remove \uXXXXwe can use

现在删除\uXXXX我们可以使用

replaceAll("\\\\u[0-9a-fA-F]{4}","")

replaceAll("\\\\u[0-9a-fA-F]{4}","")



Also remember that Strings are immutable, so each str.replace..call doesn't affect strvalue, but it creates new String. So if you want to store that new string in stryou will need to use

还要记住,字符串是不可变的,所以每次str.replace..调用都不会影响str值,但它会创建新的字符串。因此,如果您想存储该新字符串,str则需要使用

str = str.replace(..)

So your solution can look like

所以你的解决方案看起来像

String text = "\"OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to\"";

text = text.replaceAll("(\\n)+"," ")
           .replaceAll("\\u[0-9A-Ha-h]{4}", "");