删除标点符号,保留字母和空格 - Java Regex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23332146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 21:58:59  来源:igfitidea点击:

Remove punctuation, preserve letters and white space - Java Regex

javaregexstringreplaceall

提问by alldavidsluck

Tonight I'm attempting to parse words from a file, and I'd like to remove all punctuation while preserving Lower and Upper case words as well as white spaces.

今晚我试图解析文件中的单词,我想删除所有标点符号,同时保留大小写单词以及空格。

String alpha = word.replaceAll("[^a-zA-Z]", "");

This replaces everything, including white spaces.

这将替换所有内容,包括空格。

Operating on a text file containing Testing, testing, 1, one, 2, two, 3, three., the output becomes TESTINGTESTINGONETWOTHREEHowever, when I change it to

对包含 的文本文件进行操作Testing, testing, 1, one, 2, two, 3, three.,输出变为TESTINGTESTINGONETWOTHREE但是,当我将其更改为

String alpha = word.replaceAll("[^a-zA-Z\s]", "");

The output does not change.

输出不会改变。

Here's this code snippet in its entirety:

这是完整的代码片段:

public class UpperCaseScanner {

    public static void main(String[] args) throws FileNotFoundException {

        //First, define the filepath the program will look for. 
        String filename = "file.txt";   //Filename
        String targetFile = "";         
        String workingDir = System.getProperty("user.dir");

        targetFile = workingDir + File.separator + filename;   //Full filepath.

        //System.out.println(targetFile); //Debug code, prints the filepath. 

        Scanner fileScan = new Scanner(new File(targetFile)); 

        while(fileScan.hasNext()){
            String word = fileScan.next();
            //Replace non-alphabet characters with empty char. 
            String alpha = word.replaceAll("[^a-zA-Z\s]", "");
            System.out.print(alpha.toUpperCase());
        }

        fileScan.close();

    }
}

file.txt has one line, reading Testing, testing, 1, one, 2, two, 3, three.My goal is for the output to read Testing Testing One Two ThreeAm I just doing something wrong in the regular expression, or is there something else I need to do? If it's relevant, I'm working in 32-bit Eclipse 2.0.2.2.

file.txt 有一行,读取Testing, testing, 1, one, 2, two, 3, three.我的目标是让输出读取Testing Testing One Two Three我只是在正则表达式中做错了什么,还是我需要做其他事情?如果相关,我正在使用 32 位 Eclipse 2.0.2.2。

回答by Jeff Ward

I was able to get the output you were looking for using this. I wasn't sure if you required multiple spaces to be single space that is why I added the second call to replace all to convert multiple spaces to a single space.

我能够使用它获得您正在寻找的输出。我不确定您是否需要多个空格作为单个空格,这就是为什么我添加了第二个调用来替换 all 以将多个空格转换为单个空格。

public class RemovePunctuation {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\s]", "").replaceAll("\s+", " ");
        System.out.println(alpha);
    }
}

This methods outputs:

此方法输出:

Testing testing one two three

Testing testing one two three

If you wanted the first character of each word capitalized (like you showed in your question) then you could do this:

如果您希望每个单词的第一个字符大写(就像您在问题中所示),那么您可以这样做:

public class Foo {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\s]", "").replaceAll("\s+", " ");
        System.out.println(alpha);

        StringBuilder upperCaseWords = new StringBuilder();
        String[] words = alpha.split("\s");

        for(String word : words) {
            String upperCase = Character.toUpperCase(word.charAt(0)) + word.substring(1) + " ";
            upperCaseWords.append(upperCase);
        }
        System.out.println(upperCaseWords.toString());
    }
}

Which outputs:

哪些输出:

Testing testing one two three Testing Testing One Two Three

Testing testing one two three Testing Testing One Two Three

回答by Sirius_Black

i think that Java supports

我认为Java支持

\p{Punct}

which removes all punctuation characters

删除所有标点符号

回答by Milan Das

System.out.println(str.replaceAll("\p{P}", ""));         //Removes Special characters only
System.out.println(str.replaceAll("[^a-zA-Z]", ""));      //Removes space, Special Characters and digits
System.out.println(str.replaceAll("[^a-zA-Z\s]", ""));   //Removes Special Characters and Digits
System.out.println(str.replaceAll("\s+", ""));           //Remove spaces only
System.out.println(str.replaceAll("\p{Punct}", ""));     //Removes Special characters only
System.out.println(str.replaceAll("\W", ""));            //Removes space, Special Characters but not digits
System.out.println(str.replaceAll("\p{Punct}+", ""));    //Removes Special characters only
System.out.println(str.replaceAll("\p{Punct}|\d", "")); //Removes Special Characters and Digits