java Java中只包含字母、数字和空格的字符串的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4989365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 08:57:58  来源:igfitidea点击:

Regexp for a string to contain only letters , numbers and space in Java

javaregexstring

提问by sat

Requirement: String should contain only letters , numbers and space.
I have to pass a clean name to another API.

要求:字符串只能包含字母、数字和空格。
我必须将一个干净的名称传递给另一个 API。

Implementation: Java

实现:Java

I came up with this for my requirement

我想出了这个以满足我的要求

public static String getCleanFilename(String filename) {
    if (filename == null) {
        return null;
    }
    return filename.replaceAll("[^A-Za-z0-9 ]","");
}

This works well for few of my testcase , but want to know am I missing any boundary conditions, or any better way (in performance) to do it.

这适用于我的少数测试用例,但想知道我是否缺少任何边界条件,或任何更好的方法(在性能方面)来做到这一点。

采纳答案by Stefan Kendall

To answer you're direct question, \tfails your method and passes through as "space." Switch to \s([...\s]and you're good.

要回答您的直接问题,\t请使您的方法失败并作为“空间”通过。切换到\s([...\s]你很好。

At any rate, your design is probably flawed. Instead of arbitrarily dicking with user input, let the user know what you don't allow and make the correction manual.

无论如何,您的设计可能存在缺陷。与其随意修改用户输入,不如让用户知道您不允许什么并制作更正手册。

EDIT:
If the filename doesn't matter, take the SHA-2 hash of the file name and use that. Guaranteed to meet your requirements.

编辑:
如果文件名无关紧要,请使用文件名的 SHA-2 哈希值并使用它。保证满足您的要求。

回答by Howard

Additional to comments: i don't think that performance is an issue in a scenario where user input is taken (and a filename shouldn't be that long...).

补充评论:我认为在采用用户输入的情况下性能不是问题(并且文件名不应该那么长......)。

But concerning your question: you may reduce the number of replacements by adding an additional + in your regex:

但是关于您的问题:您可以通过在正则表达式中添加额外的 + 来减少替换次数:

[^A-Za-z0-9 ]+

[^A-Za-z0-9 ]+