文本清理和替换:从 Java 中的文本中删除 \n
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/542226/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Text cleaning and replacement: delete \n from a text in Java
提问by Fernando Briano
I'm cleaning an incoming text in my Java code. The text includes a lot of "\n", but not as in a new line, but literally "\n". I was using replaceAll() from the String class, but haven't been able to delete the "\n". This doesn't seem to work:
我正在清理我的 Java 代码中的传入文本。文本包含很多“\n”,但不像在新行中那样,而是字面上的“\n”。我正在使用 String 类中的 replaceAll(),但无法删除“\n”。这似乎不起作用:
String string;
string = string.replaceAll("\n", "");
Neither does this:
这也不行:
String string;
string = string.replaceAll("\n", "");
I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.
我想最后一个被识别为实际的新行,因此文本中的所有新行都将被删除。
Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?
此外,从字符串中删除不同模式的错误文本的有效方法是什么?我正在使用正则表达式来检测它们,例如 HTML 保留字符等内容和 replaceAll,但每次我使用 replaceAll 时,都会读取整个字符串,对吗?
UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D
更新:感谢您的精彩回答。我在这里扩展了这个问题:
文本替换效率
我专门询问效率 :D
采纳答案by MBCook
Hooknc is right. I'd just like to post a little explanation:
Hooknc 是对的。我只想发表一点解释:
"\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).
“\\n”在编译器完成后转换为“\n”(因为您转义了反斜杠)。因此,正则表达式引擎看到 "\n" 并认为是新行,并将删除那些(而不是您拥有的文字 "\n")。
"\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.
"\n" 由编译器转换为真正的新行。所以新行字符被发送到正则表达式引擎。
"\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.
"\\\\n" 很丑,但是是对的。编译器会删除转义序列,因此正则表达式引擎会看到“\\n”。正则表达式引擎看到两个反斜杠,并知道第一个反斜杠对其进行转义,从而转换为检查文字字符“\”和“n”,从而为您提供所需的结果。
Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.
Java 很好(这是我工作的语言),但必须考虑基本上双重转义正则表达式可能是一个真正的挑战。为了额外的乐趣,StackOverflow 似乎也喜欢尝试翻译反斜杠。
回答by hooknc
I think you need to add a couple more slashies...
我认为你需要添加更多的斜线......
String string;
string = string.replaceAll("\\n", "");
Explanation: The number of slashies has to do with the fact that "\n" by itself is a controlled character in Java.
说明:斜线的数量与“\n”本身是 Java 中的受控字符这一事实有关。
So to get the real characters of "\n" somewhere we need to use "\n". Which if printed out with give us: "\"
因此,要在某处获取“\n”的真实字符,我们需要使用“\n”。如果打印出来给我们:“\”
You're looking to replace all "\n" in your file. But you're not looking to replace the control "\n". So you tried "\n" which will be converted into the characters "\n". Great, but maybe not so much. My guess is that the replaceAll method will actually create a Regular Expression now using the "\n" characters which will be misread as the control character "\n".
您希望替换文件中的所有“\n”。但是您不想替换控件“\n”。所以你尝试了“\n”,它将被转换为字符“\n”。很棒,但可能没有那么多。我的猜测是,replaceAll 方法现在实际上会使用“\n”字符创建一个正则表达式,这些字符会被误读为控制字符“\n”。
Whew, almost done.
哇,快完成了。
Using replaceAll("\\n", "") will first convert "\\n" -> "\n" which will be used by the Regular Expression. The "\n" will then be used in the Regular Expression and actually represents your text of "\n". Which is what you're looking to replace.
使用 replaceAll("\\n", "") 将首先转换正则表达式将使用的 "\\n" -> "\n"。然后“\n”将在正则表达式中使用,并实际代表“\n”文本。这就是您要替换的内容。
回答by Avi
Instead of String.replaceAll(), which uses regular expressions, you might be better off using String.replace(), which does simple string substitution (if you are using at least Java 1.5).
代替使用正则表达式的 String.replaceAll(),您可能最好使用 String.replace(),它执行简单的字符串替换(如果您至少使用 Java 1.5)。
String replacement = string.replace("\n", "");
should do what you want.
应该做你想做的。
回答by polygenelubricants
The other answers have sufficiently covered how to do this with replaceAll
, and how you need to escape backslashes as necessary.
其他答案已经充分涵盖了如何使用replaceAll
,以及如何根据需要转义反斜杠。
Since 1.5., there is also String.replace(CharSequence, CharSequence)
that performs literal string replacement. This can greatly simplify many problem of string replacements, because there is no need to escape any regular expression metacharacters like .
, *
, |
, and yes, \
itself.
从 1.5. 开始,还有String.replace(CharSequence, CharSequence)
执行文字字符串替换的功能。这样可以大大简化字符串替换的许多问题,是因为没有必要逃避任何正则表达式元字符像.
,*
,|
,是的,\
本身。
Thus, given a string that can contain the substring "\n"
(not '\n'
), we can delete them as follows:
因此,给定一个可以包含子字符串"\n"
(不是'\n'
)的字符串,我们可以按如下方式删除它们:
String before = "Hi!\n How are you?\n I'm \n good!";
System.out.println(before);
// Hi!\n How are you?\n I'm
// good!
String after = before.replace("\n", "");
System.out.println(after);
// Hi! How are you? I'm
// good!
Note that if you insist on using replaceAll
, you can prevent the ugliness by using Pattern.quote
:
请注意,如果您坚持使用replaceAll
,则可以通过使用来防止丑陋Pattern.quote
:
System.out.println(
before.replaceAll(Pattern.quote("\n"), "")
);
// Hi! How are you? I'm
// good!
You should also use Pattern.quote
when you're given an arbitrary string that must be matched literally instead of as a regular expression pattern.
Pattern.quote
当你得到一个必须按字面匹配的任意字符串而不是正则表达式模式时,你也应该使用它。
回答by Amit
Try this. Hope it helps.
尝试这个。希望能帮助到你。
raw = raw.replaceAll("\t", "");
raw = raw.replaceAll("\n", "");
raw = raw.replaceAll("\r", "");
回答by jessica
Normally \n works fine. Otherwise you can opt for multiple replaceAll statements. first apply one replaceAll on the text, and then reapply replaceAll again on the text. Should do what you are looking for.
通常 \n 工作正常。否则,您可以选择多个 replaceAll 语句。首先在文本上应用一个 replaceAll,然后在文本上再次应用 replaceAll。应该做你正在寻找的。
回答by Harz
I believe replaceAll()
is an expensive operation. The below solution will probably perform better:
我相信这replaceAll()
是一项昂贵的手术。以下解决方案可能会表现得更好:
String temp = "Hi \n Wssup??";
System.out.println(temp);
StringBuilder result = new StringBuilder();
StringTokenizer t = new StringTokenizer(temp, "\n");
while (t.hasMoreTokens()) {
result.append(t.nextToken().trim()).append("");
}
String result_of_temp = result.toString();
System.out.println(result_of_temp);
回答by maveonair
I used this solution to solve that problem:
我用这个解决方案来解决这个问题:
String replacement = str.replaceAll("[\n\r]", "");
回答by gattsbr
string = string.replaceAll(""+(char)10, " ");