如何在 Java 中修剪不间断空间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28295504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 05:57:29  来源:igfitidea点击:

How to trim no-break space in Java?

javastring

提问by Abhishek

I've input an input file which I need to process and discard all the white-spaces, including non-breaking space U+00A0aka  (You can produce it in Notepad by pressing Altand then typing 0160from the keyboard's numeric pad.) or any other form of white space. I have tried String.trim()but it doesn't trim U+00A0.

[我已输入,我需要处理和丢弃所有的白色空间,包括非换空间的输入文件U+00A0又名 (您可以通过按生产它在记事本中Alt,然后键入0160从键盘的数字小键盘。)或任何其他形式的空白。我试过了,String.trim()但它没有修剪U+00A0

Do I need to explicitly check for U+00A0and then trim()or is there an easy way to trim all kinds of white-spaces in Java?

我是否需要明确检查U+00A0然后trim()或者是否有一种简单的方法来修剪 Java 中的各种空格?

采纳答案by Cfx

While  is a non breaking space(a space that does not want to be treated as whitespace), you can trim a string while preserving every  within the string with a simple regex:

虽然 是一个不间断空格(不想被视为空格的空格),但您可以修剪字符串,同时 使用简单的正则表达式保留字符串中的每个:

string.replaceAll("(^\h*)|(\h*$)","")
  • \his a horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
  • \h是一个水平空白字符: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]

If you are using a pre JDK8 Version, you need to explicitly use the list of chars instead of \h.

如果您使用的是 JDK8 之前的版本,则需要明确使用字符列表而不是\h.

回答by RobAu

U+0160is not whitespace, so it won't be trimmed. But you can simply replace()that characters with a space, and then call trim(), so you keep the spaces that are 'inside' your string.

U+0160不是空白,所以它不会被修剪。但是您可以简单地replace()用空格来表示字符,然后调用trim(),这样您就可以保留字符串“内部”的空格。

string = string.replace('\u00A0',' ').trim()

There are three non-breaking whitespacecharacters that are excluded from the Character.isWhitespace() method : \u00A0, \u2007and, \u202F, so you probably want to replace those too.

三个非打破空白被排除在Character.isWhitespace()方法的字符:\u00A0\u2007\u202F,所以你可能要替换这些呢。

回答by RobAu

You could do it with a guava CharMatcher, for example:

你可以用 guava 来做CharMatcher,例如:

CharMatcher.anyOf("\r\n\t \u00A0").trimFrom(input);
CharMatcher.whitespace().trimFrom(input);

See also this nice reference on whitespaces definition

另请参阅有关空白定义的这个很好的参考

回答by ForguesR

If you happen to use Apache Commons Langthen you can use stripand add all the characters you want.

如果您碰巧使用Apache Commons Lang,那么您可以使用strip并添加您想要的所有字符。

final String STRIPPED_CHARS = " \t\u00A0\u1680\u180e\u2000\u200a\u202f\u205f\u3000";

String s = "\u3000 \tThis str contains a non-breaking\u00A0space and a\ttab. ";
s = StringUtils.strip(s, STRIPPED_CHARS);  
System.out.println(s);  // Gives : "This str contains a non-breaking space and a    tab."

回答by logbasex

You can try this:

你可以试试这个:

string.replaceAll("\p{Z}","");

From https://www.regular-expressions.info/unicode.html:

https://www.regular-expressions.info/unicode.html

\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.

\p{Z} 或 \p{Separator}:任何类型的空格或不可见的分隔符。