如何从 Java 中的输入文本中删除标点符号?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18830813/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 11:27:47  来源:igfitidea点击:

How can I remove punctuation from input text in Java?

javaregexstringformatting

提问by TheDoctor

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

我正在尝试使用用户在 Java 中的输入来获取句子,我需要将其设为小写并删除所有标点符号。这是我的代码:

    String[] words = instring.split("\s+");
    for (int i = 0; i < words.length; i++) {
        words[i] = words[i].toLowerCase();
    }
    String[] wordsout = new String[50];
    Arrays.fill(wordsout,"");
    int e = 0;
    for (int i = 0; i < words.length; i++) {
        if (words[i] != "") {
            wordsout[e] = words[e];
            wordsout[e] = wordsout[e].replaceAll(" ", "");
            e++;
        }
    }
    return wordsout;

I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

我似乎找不到任何方法来删除所有非字母字符。我试过使用正则表达式和迭代器,但没有运气。谢谢你的帮助。

采纳答案by Bohemian

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

这首先删除所有非字母字符,折叠为小写,然后拆分输入,在一行中完成所有工作:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\s+");

Spaces are initially left in the input so the split will still work.

空格最初保留在输入中,因此拆分仍然有效。

By removing the rubbish characters beforesplitting, you avoid having to loop through the elements.

通过拆分之前删除垃圾字符,您可以避免遍历元素。

回答by Rahul Tripathi

You may try this:-

你可以试试这个:-

Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\W", "");
System.out.println("Your string: " + strippedInput);

[^\w]matches a non-word character, so the above regular expression will match and remove all non-word characters.

[^\w]匹配一个非单词字符,因此上面的正则表达式将匹配并删除所有非单词字符。

回答by Josh M

If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:

如果您不想使用 RegEx(鉴于您的问题,这似乎是非常不必要的),也许您应该尝试这样的事情:

public String modified(final String input){
    final StringBuilder builder = new StringBuilder();
    for(final char c : input.toCharArray())
        if(Character.isLetterOrDigit(c))
            builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
    return builder.toString();
}

It loops through the underlying char[]in the Stringand only appends the charif it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.

它通过循环的基本char[]String和唯一的追加char,如果它是一个字母或数字(过滤掉所有符号,我假设是您想要什么来完成),然后追加的小写版本char

回答by Nerzid

I don't like to use regex, so here is another simple solution.

我不喜欢使用正则表达式,所以这是另一个简单的解决方案。

public String removePunctuations(String s) {
    String res = "";
    for (Character c : s.toCharArray()) {
        if(Character.isLetterOrDigit(c))
            res += c;
    }
    return res;
}

Note: This will include both Letters and Digits

注意:这将包括字母和数字

回答by ravthiru

You can use following regular expression construct

您可以使用以下正则表达式构造

Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

标点符号之一:!"#$%&'()*+,-./:;<=>?@[]^_`{|}~

inputString.replaceAll("\p{Punct}", "");