删除标点符号，保留字母和空格 - Java Regex

Question

提问by alldavidsluck

Tonight I'm attempting to parse words from a file, and I'd like to remove all punctuation while preserving Lower and Upper case words as well as white spaces.

今晚我试图解析文件中的单词，我想删除所有标点符号，同时保留大小写单词以及空格。

String alpha = word.replaceAll("[^a-zA-Z]", "");

This replaces everything, including white spaces.

这将替换所有内容，包括空格。

Operating on a text file containing Testing, testing, 1, one, 2, two, 3, three., the output becomes TESTINGTESTINGONETWOTHREEHowever, when I change it to

对包含的文本文件进行操作Testing, testing, 1, one, 2, two, 3, three.，输出变为TESTINGTESTINGONETWOTHREE但是，当我将其更改为

String alpha = word.replaceAll("[^a-zA-Z\s]", "");

The output does not change.

输出不会改变。

Here's this code snippet in its entirety:

这是完整的代码片段：

public class UpperCaseScanner {

    public static void main(String[] args) throws FileNotFoundException {

        //First, define the filepath the program will look for. 
        String filename = "file.txt";   //Filename
        String targetFile = "";         
        String workingDir = System.getProperty("user.dir");

        targetFile = workingDir + File.separator + filename;   //Full filepath.

        //System.out.println(targetFile); //Debug code, prints the filepath. 

        Scanner fileScan = new Scanner(new File(targetFile)); 

        while(fileScan.hasNext()){
            String word = fileScan.next();
            //Replace non-alphabet characters with empty char. 
            String alpha = word.replaceAll("[^a-zA-Z\s]", "");
            System.out.print(alpha.toUpperCase());
        }

        fileScan.close();

    }
}

file.txt has one line, reading Testing, testing, 1, one, 2, two, 3, three.My goal is for the output to read Testing Testing One Two ThreeAm I just doing something wrong in the regular expression, or is there something else I need to do? If it's relevant, I'm working in 32-bit Eclipse 2.0.2.2.

file.txt 有一行，读取Testing, testing, 1, one, 2, two, 3, three.我的目标是让输出读取Testing Testing One Two Three我只是在正则表达式中做错了什么，还是我需要做其他事情？如果相关，我正在使用 32 位 Eclipse 2.0.2.2。

Answer 1

回答by Jeff Ward

I was able to get the output you were looking for using this. I wasn't sure if you required multiple spaces to be single space that is why I added the second call to replace all to convert multiple spaces to a single space.

我能够使用它获得您正在寻找的输出。我不确定您是否需要多个空格作为单个空格，这就是为什么我添加了第二个调用来替换 all 以将多个空格转换为单个空格。

public class RemovePunctuation {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\s]", "").replaceAll("\s+", " ");
        System.out.println(alpha);
    }
}

This methods outputs:

此方法输出：

Testing testing one two three

If you wanted the first character of each word capitalized (like you showed in your question) then you could do this:

如果您希望每个单词的第一个字符大写（就像您在问题中所示），那么您可以这样做：

public class Foo {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\s]", "").replaceAll("\s+", " ");
        System.out.println(alpha);

        StringBuilder upperCaseWords = new StringBuilder();
        String[] words = alpha.split("\s");

        for(String word : words) {
            String upperCase = Character.toUpperCase(word.charAt(0)) + word.substring(1) + " ";
            upperCaseWords.append(upperCase);
        }
        System.out.println(upperCaseWords.toString());
    }
}

Which outputs:

哪些输出：

Testing testing one two three Testing Testing One Two Three

Answer 2

回答by Sirius_Black

i think that Java supports

我认为Java支持

\p{Punct}

which removes all punctuation characters

删除所有标点符号

Answer 3

回答by Milan Das

System.out.println(str.replaceAll("\p{P}", ""));         //Removes Special characters only
System.out.println(str.replaceAll("[^a-zA-Z]", ""));      //Removes space, Special Characters and digits
System.out.println(str.replaceAll("[^a-zA-Z\s]", ""));   //Removes Special Characters and Digits
System.out.println(str.replaceAll("\s+", ""));           //Remove spaces only
System.out.println(str.replaceAll("\p{Punct}", ""));     //Removes Special characters only
System.out.println(str.replaceAll("\W", ""));            //Removes space, Special Characters but not digits
System.out.println(str.replaceAll("\p{Punct}+", ""));    //Removes Special characters only
System.out.println(str.replaceAll("\p{Punct}|\d", "")); //Removes Special Characters and Digits

删除标点符号，保留字母和空格 - Java Regex

提问by alldavidsluck

回答by Jeff Ward

回答by Sirius_Black

回答by Milan Das

相关推荐

最近更新

标签

删除标点符号，保留字母和空格 - Java Regex

提问by alldavidsluck

回答by Jeff Ward

回答by Sirius_Black

回答by Milan Das

相关推荐

Java 语句关闭后不允许进行任何操作

Java 在 JSF bean 中获取请求 URL？

Java 中的持久化 HttpURLConnection

Java 两个日期之间的 Android 天数

相关推荐

最近更新

标签