如何从 Java 中的字符串中正确修剪空格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1437933/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 12:15:44  来源:igfitidea点击:

How to properly trim whitespaces from a string in Java?

javastringunicode

提问by itsadok

The JDK's String.trim()method is pretty naive, and only removes ascii control characters.

JDK 的String.trim()方法非常简单,只删除 ascii 控制字符。

Apache Commons' StringUtils.strip()is slightly better, but uses the JDK's Character.isWhitespace(), which doesn't recognize non-breaking space as whitespace.

Apache Commons 的StringUtils.strip()稍微好一些,但使用 JDK 的Character.isWhitespace(),它不会将不间断空格识别为 whitespace

So what would be the most complete, Unicode-compatible, safe and proper way to trim a string in Java?

那么在 Java 中修剪字符串的最完整、Unicode 兼容、安全和正确的方法是什么?

And incidentally, is there a better library than commons-langthat I should be using for this sort of stuff?

顺便说一句,有没有比commons-lang我应该用于这类东西的库更好的库?

采纳答案by CrazyCoder

Google has made guava-librariesavailable recently. It may havewhat you are looking for:

谷歌最近提供了番石榴库。它可能有你要找的东西:

CharMatcher.inRange('
CharMatcher.WHITESPACE.trimFrom(str)
', ' ').trimFrom(str)

is equivalent to String.trim(), but you can customize what to trim, refer to the JavaDoc.

相当于String.trim(),但是可以自定义要修剪的内容,参考JavaDoc。

For instance, it has its own definition of WHITESPACEwhich differs from the JDK and is defined according to the latest Unicode standard, so what you need can be written as:

例如,它有自己的WHITESPACE定义,与JDK不同,是根据最新的Unicode标准定义的,所以你需要的可以写成:

String s = "  \t testing \u00a0"
s = CharMatcher.WHITESPACE.trimFrom(s);

回答by Jo?o Silva

I've always found trimto work pretty well for almost every scenario.

我一直发现trim几乎适用于所有场景。

However, if you really want to include more characters, you can edit the stripmethod from commons-langto include not only the test for Character.isWhitespace, but also for Character.isSpaceCharwhich seemsto be what's missing. Namely, the following lines at stripStartand stripEnd, respectively:

但是,如果您真的想包含更多字符,则可以编辑strip方法 fromcommons-lang以不仅包含 的测试Character.isWhitespace,而且还Character.isSpaceChar包含似乎缺少的内容。即,分别在stripStart和处的以下几行stripEnd

  • while ((start != strLen) && Character.isWhitespace(str.charAt(start)))
  • while ((end != 0) && Character.isWhitespace(str.charAt(end - 1)))
  • while ((start != strLen) && Character.isWhitespace(str.charAt(start)))
  • while ((end != 0) && Character.isWhitespace(str.charAt(end - 1)))

回答by itsadok

I swear I only saw this after I posted the question: Google just released Guava, a library of core Java utilities.

我发誓我是在发布问题后才看到这个的:谷歌刚刚发布了Guava,一个核心 Java 实用程序的库。

I haven't tried this yet, but from what I can tell, this is fully Unicode compliant:

我还没有尝试过,但据我所知,这完全符合 Unicode:

  public static boolean isWhitespace (int ch)
  {
    if (ch == ' ' || (ch >= 0x9 && ch <= 0xD))
      return true;
    if (ch < 0x85) // short-circuit optimization.
      return false;
    if (ch == 0x85 || ch == 0xA0 || ch == 0x1680 || ch == 0x180E)
      return true;
    if (ch < 0x2000 || ch > 0x3000)
      return false;
    return ch <= 0x200A || ch == 0x2028 || ch == 0x2029
      || ch == 0x202F || ch == 0x205F || ch == 0x3000;
  }

回答by ZZ Coder

It's really hard to define what constitutes white spaces. Sometimes I use non-breakable spaces just to make sure it doesn't get stripped. So it will be hard to find a library to do exactly what you want.

很难定义什么是空白。有时我使用不可破坏的空格只是为了确保它不会被剥离。所以很难找到一个图书馆来做你想做的事。

I use my own trim() if I want trim every white space. Here is the function I use to check for white spaces,

如果我想修剪每个空白区域,我会使用我自己的 trim()。这是我用来检查空格的函数,

public static String trimAdvanced(String value) {

        Objects.requireNonNull(value);

        int strLength = value.length();
        int len = value.length();
        int st = 0;
        char[] val = value.toCharArray();

        if (strLength == 0) {
            return "";
        }

        while ((st < len) && (val[st] <= ' ') || (val[st] == '\u00A0')) {
            st++;
            if (st == strLength) {
                break;
            }
        }
        while ((st < len) && (val[len - 1] <= ' ') || (val[len - 1] == '\u00A0')) {
            len--;
            if (len == 0) {
                break;
            }
        }


        return (st > len) ? "" : ((st > 0) || (len < strLength)) ? value.substring(st, len) : value;
    }

回答by Ertu?rul ?etin

I did little changes on java's trim() method and it supports non-ascii characters.This method runs faster than most of the implementations.

我对java 的trim() 方法做了很少的改动,它支持非ascii 字符。这个方法比大多数实现运行得更快。

String trimmed = original.replaceAll ("^\p{IsWhite_Space}+|\p{IsWhite_Space}+$", "");

回答by Aleksandr Dubinsky

This handles Unicode characters and doesn't require extra libraries:

这处理 Unicode 字符并且不需要额外的库:

String trimmed = original.replaceAll (either (START_BOUNDARY + oneOrMore (WHITESPACE), oneOrMore (WHITESPACE) + END BOUNDARY), "");

A slight snag is that there are some related whitespace characters without Unicode character property "WSpace=Y" which are listed in Wikipedia. These probably won't cause a problem, but you can easy add them to the character class too.

一个小问题是维基百科中列出了一些没有 Unicode 字符属性“WSpace=Y”的相关空白字符。这些可能不会引起问题,但您也可以轻松地将它们添加到字符类中。

Using almson-regexthe regex will look like:

使用almson-regex 正则表达式将如下所示:

##代码##

and include the more relevant of the non-Unicode whitespace.

并包括更相关的非 Unicode 空格。