Java 从字符串中删除“空”字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3396525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 22:55:50  来源:igfitidea点击:

Remove "empty" character from String

javacharacter

提问by black666

I'm using a framwork which returns malformed Strings with "empty" characters from time to time.

我正在使用一个框架,它不时返回带有“空”字符的格式错误的字符串。

"foobar" for example is represented by: [,f,o,o,b,a,r]

例如“foobar”表示为:[,f,o,o,b,a,r]

The first character is NOT a whitespace (' '), so a System.out.printlin() would return "foobar" and not " foobar". Yet, the length of the String is 7 instead of 6. Obviously this makes most String methods (equals, split, substring,..) useless. Is there a way to remove empty characters from a String?

第一个字符不是空格 (' '),因此 System.out.printlin() 将返回“foobar”而不是“foobar”。然而,String 的长度是 7 而不是 6。显然这使得大多数 String 方法(equals、split、substring 等)变得无用。有没有办法从字符串中删除空字符?

I tried to build a new String like this:

我试图构建一个这样的新字符串:

StringBuilder sb = new StringBuilder();
for (final char character : malformedString.toCharArray()) {
  if (Character.isDefined(character)) {
    sb.append(character);
  }
}
sb.toString();

Unfortunately this doesn't work. Same with the following code:

不幸的是,这不起作用。与以下代码相同:

StringBuilder sb = new StringBuilder();
for (final Character character : malformedString.toCharArray()) {
  if (character != null) {
    sb.append(character);
  }
}
sb.toString();

I also can't check for an empty character like this:

我也无法检查这样的空字符:

   if (character == ''){
     //
   }

Obviously there is something wrong with the String .. but I can't change the framework I'm using or wait for them to fix it (if it is a bug within their framework). I need to handle this String and sanatize it.

显然 String 有问题......但我无法更改我正在使用的框架或等待他们修复它(如果它是他们框架中的错误)。我需要处理这个字符串并对其进行消毒。

Any ideas?

有任何想法吗?

采纳答案by BalusC

It's probably the NULL characterwhich is represented by \0. You can get rid of it by String#trim().

这也许是NULL字符是由代表\0。你可以摆脱它String#trim()

To nail down the exact codepoint, do so:

要确定确切的代码点,请执行以下操作:

for (char c : string.toCharArray()) {
    System.out.printf("U+%04x ", (int) c);
}

Then you can find the exact character here.

然后你可以在这里找到确切的字符。



Update:as per the update:

更新:根据更新:

Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

任何人都知道一种只包含一系列有效字符而不是排除 UTF8 范围的 95% 的方法吗?

You can do that with help of regex. See the answer of @polygenelubricants here and this answer.

你可以在正则表达式的帮助下做到这一点。在此处查看@polygenelubricants 的答案和此答案

On the other hand, you can also just fix the problem in its root instead of workarounding it. Either update the files to get rid of the BOM mark, it's a legacy way to distinguish UTF-8 files from others which is nowadays worthless, or use a Readerwhich recognizes and skips the BOM. Also see this question.

另一方面,您也可以只解决问题的根源,而不是解决它。要么更新文件以去除 BOM 标记,这是一种将 UTF-8 文件与其他现在毫无价值的文件区分开来的传统方法,要么使用Reader识别并跳过 BOM 的 。另请参阅此问题

回答by ESP

trim left or right removes white spaces. does it has a colon before space?

向左或向右修剪删除空格。空格前有冒号吗?

even more: a=(long) string[0]; will show u the char code, and u can use replace() or substring.

甚至更多: a=(long) string[0]; 将向您显示字符代码,您可以使用 replace() 或子字符串。

回答by BalusC

You could check for the whitespace like this:

您可以像这样检查空格:

if (character.equals(' ')){ // }

回答by black666

Thank you Johannes R?ssel. It actually was '\uFEFF'

谢谢 Johannes R?ssel。它实际上是 '\uFEFF'

The following code works:

以下代码有效:

 final StringBuilder sb = new StringBuilder();
    for (final char character : body.toCharArray()) {
       if (character != '\uFEFF') {
          sb.append(character);
       }
     }  
 final String sanitzedString = sb.toString();

Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

任何人都知道一种只包含一系列有效字符而不是排除 UTF8 范围的 95% 的方法吗?

回答by polygenelubricants

Regex would be an appropriate way to sanitize the string from unwanted Unicode characters in this case.

在这种情况下,正则表达式将是从不需要的 Unicode 字符中清除字符串的合适方法。

String sanitized = dirty.replaceAll("[\uFEFF-\uFFFF]", ""); 

This will replace all charin \uFEFF-\uFFFFrange with the empty string.

这将替换所有char\uFEFF-\uFFFF用空字符串范围。

The [...]construct is called a character class, e.g. [aeiou]matches one of any of the lowercase vowels, [^aeiou]matches anything but.

[...]构造称为字符类,例如[aeiou]匹配任何小写元音之一,[^aeiou]匹配任何其他内容。

You can do one of these two approaches:

您可以执行以下两种方法之一:

  • replaceAll("[blacklist]", "")
  • replaceAll("[^whitelist]", "")
  • replaceAll("[blacklist]", "")
  • replaceAll("[^whitelist]", "")

References

参考

回答by Ilia Altshuler

for (int i = 0; i < s.length(); i++)
    if (s.charAt(i) == ' ') {
        your code....
    }

回答by RightHandedMonkey

A very simple way to remove the UTF-8 BOM from a string, using substring as Denis Tulskiy suggested. No looping needed. Just checks the first character for the mark and skips it if needed.

从字符串中删除 UTF-8 BOM 的一种非常简单的方法,使用 Denis Tulskiy 建议的子字符串。不需要循环。只需检查标记的第一个字符,并在需要时跳过它。

public static String removeUTF8BOM(String s) {
    if (s.startsWith("\uFEFF")) {
        s = s.substring(1);
    }
    return s;
}

I needed to add this to my code when using the Apache HTTPClient EntityUtil to read from a webserver. The webserver was not sending the blank mark but it was getting pulled in while reading the input stream. Original article can be found here.

在使用 Apache HTTPClient EntityUtil 从网络服务器读取时,我需要将此添加到我的代码中。网络服务器没有发送空白标记,而是在读取输入流时被拉入。原始文章可以在这里找到。

回答by Steve Smith

This is what worked for me:-

这对我有用:-

    StringBuilder sb = new StringBuilder();
    for (char character : myString.toCharArray()) {
        int i = (int) character;
        if (i > 0 && i <= 256) {
            sb.append(character);
        }
    }  
    return sb.toString();

The int value of my NULL characters was in the region of 8103 or something.

我的 NULL 字符的 int 值在 8103 或其他区域内。

回答by Lalji Gajera

Simply malformedString.trim()will solve the issue.

简单的malformedString.trim()将解决这个问题。