java 如何使用java从文件中的字符串中删除特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17989698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 19:47:20  来源:igfitidea点击:

how to remove special characters from string in a file using java

javaregexspecial-characters

提问by user2609542

I have text file it contains following information.My task is to remove special symbols from that text file.My input file conatins

我有包含以下信息的文本文件。我的任务是从该文本文件中删除特殊符号。我的输入文件包含

This is sample CCNA program. it contains CCNP?.

This is sample CCNA program. it contains CCNP?.

My required output string:

我需要的输出字符串:

This is sample CCNA program. it contains CCNP.

how to do this please suggest me.

如何做到这一点请建议我。

thanks

谢谢

回答by anubhava

This should work, "if you're looking to retain only ASCII (0-127) characters in your string":

这应该有效,“如果您希望在字符串中仅保留 ASCII (0-127) 字符”:

String str = "This is sample CCNA program. it contains CCNP?";
str = str.replaceAll("[^\x00-\x7f]+", "");

回答by Stephen Lake

Do you want to remove all special characters from your strings? If so:

您想从字符串中删除所有特殊字符吗?如果是这样:

String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");

Please see Sean Patrick Floyd'sanswer to a possible duplicate question.

请参阅Sean Patrick Floyd对可能重复的问题的回答。

回答by stema

You can do it from a Unicode point of view:

你可以从 Unicode 的角度来做:

String s = "This is sample CCNA program. it contains CCNP?. And it contains digits 123456789.";
String res = s.replaceAll("[^\p{L}\p{M}\p{P}\p{Nd}\s]+", "");
System.out.println(res);

will print out:

将打印出:

This is sample CCNA program. it contains CCNP. And it contains digits 123456789.

这是示例 CCNA 程序。它包含 CCNP。它包含数字 123456789。

\\p{...}is a Unicode property

\\p{...}是一个Unicode 属性

\\p{L}matches all letters from all languages

\\p{L}匹配所有语言的所有字母

\\p{M}a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).

\\p{M}旨在与另一个字符组合的字符(例如重音、变音、封闭框等)。

\\p{P}any kind of punctuation character.

\\p{P}任何类型的标点符号。

\\p{Nd}a digit zero through nine in any script except ideographic scripts.

\\p{Nd}除表意文字外的任何文字中的数字零到九。

So this regex will replace every character that is not a letter (also combined letters), a Punctuation, a digit or a withespace character (\\s).

因此,此正则表达式将替换不是字母(也包括组合字母)、标点符号、数字或带空格字符 ( \\s)的所有字符。

回答by Deckard27

 ^[\u0000-\u007F]*$

With this you allow only ASCCI characters, but you need to say us what is for you an special character.

有了这个,您只允许使用 ASCCI 字符,但您需要告诉我们什么是适合您的特殊字符。

回答by Prabhakaran Ramaswamy

       String  yourString = "This is sample CCNA program. it contains CCNP?";
       String result = yourString.replaceAll("[\?]","");       
       System.out.println(yourString);
       System.out.println(result);

回答by agad

You can also try something like:

您还可以尝试以下操作:

Normalizer.decompose(str, false, 0).replaceAll("\p{InSuperscriptsAndSubscripts}+", "");

but you need to find proper Unicode group or groups (Unicode Blocks).

但您需要找到合适的 Unicode 组(Unicode Blocks)。

回答by STM

You'd have to really define what special characters are in your instance.

您必须真正定义实例中的特殊字符。

If you are not a fan of RegEx, you could consider using some methods out of the Characterclass. See sample below:

如果你不是 RegEx 的粉丝,你可以考虑使用一些Character类外的方法。请参阅下面的示例:

public class Test {

    public static void main(String[] args) {

        String test = "This is sample CCNA program. it contains CCNP?";

        System.out.println("Character\tAlpha or Letter\tWhitespace");

        for (char c : test.toCharArray()) {
            System.out.println(
                    c + "\t\t"
                    + Character.isLetterOrDigit(c) + "\t\t" 
                    + Character.isWhitespace(c));
        }
    }
}

There are other methods that you could use in addition to the ones above. Look at the Characterclass API.

除了上述方法之外,您还可以使用其他方法。查看Character类 API。

回答by art1go

Alternative option to regex to exclude chars > 128.

正则表达式的替代选项以排除字符 > 128。

    String s = "This is sample CCNA program. it contains CCNP?";


    for (int i = 0; i < s.length(); i++) {
        if (s.charAt(i) > 128) {
            s = s.substring(0,  i) 
                    + s.substring(i + 1);
            i++;
        }
    }

回答by ishan

import java.util.Scanner;

public class replacespecialchar {

    /**
     * @param args
     */
    public static void main(String[] args) {

        String before="";

        String after="";
        Scanner in =new Scanner(System.in);
        System.out.println("enter string with special char");
        before=in.nextLine();

         for (int i=0;i<before.length();i++)
          {
              if (before.charAt(i)>=65&&before.charAt(i)<=90 || before.charAt(i)>=97&&before.charAt(i)<=122)  
              {
                    after+=before.charAt(i);
              }
          }

        System.out.println("String with special char "+before);
        System.out.println("String without special char "+after);
    }
}

回答by JBAIRD

The answer above about removing characters > 128 was very helpful. Thank you.

上面关于删除字符 > 128 的答案非常有帮助。谢谢你。

However, it did not cover some situations such as 2 bad characters in a row or a bad character at the end of the string. Here are my modifications that remove all special characters except tab and new line.

但是,它没有涵盖某些情况,例如连续 2 个坏字符或字符串末尾的坏字符。这是我的修改,删除了除制表符和换行符之外的所有特殊字符。

  // Remove all special characters except tab and linefeed
  public static String cleanTextBoxData(String value) {
    if (value != null) {
    int beforeLen = value.length();
       for (int i = 0; i < value.length(); i++) {
         if ( ((value.charAt(i)<32) || (value.charAt(i)>126)) &&
            ((value.charAt(i)!=9) && (value.charAt(i)!=10)) ) {
           if ((value.charAt(i)<32) || (value.charAt(i)>126)) {
             if (i==value.length()-1) {
               value = value.substring(0,i);
             } else {
            value = value.substring(0,i) + value.substring(i+1);
            i--;
             }
        }
           if (i == value.length()) {
             break;
           }
         }
       }
       int dif = beforeLen - value.length();
       if (dif > 0) {
         logger.warn("Found and removed {} bad characters from text box.", dif);
       }

    }
      return value;
  }