Java 正则表达式替换 Windows 在文件名中不接受的字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/754307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex to replace characters that Windows doesn't accept in a filename
提问by KdgDev
I'm trying to build a regular expression that will detect any character that Windows does not accept as part of a file name (are these the same for other OS? I don't know, to be honest).
我正在尝试构建一个正则表达式,它将检测 Windows 不接受作为文件名一部分的任何字符(这些对于其他操作系统是否相同?老实说,我不知道)。
These symbols are:
这些符号是:
\ / : * ? " |
Anyway, this is what I have: [\\/:*?\"<>|]
无论如何,这就是我所拥有的: [\\/:*?\"<>|]
The tester over at http://gskinner.com/RegExr/shows this to be working.
For the string Allo*ha
, the *
symbol lights up, signalling it's been found. Should I enter Allo**ha
however, only the first *
will light up. So I think I need to modify this regex to find all appearances of the mentioned characters, but I'm not sure.
http://gskinner.com/RegExr/ 上的测试人员显示这是有效的。对于 string Allo*ha
,*
符号亮起,表示已找到它。Allo**ha
但是,如果我进入,只有第一个*
会亮起。所以我想我需要修改这个正则表达式来找到提到的字符的所有外观,但我不确定。
You see, in Java, I'm lucky enough to have the function String.replaceAll(String regex, String replacement). The description says:
你看,在 Java 中,我很幸运拥有函数String.replaceAll(String regex, String replacement)。描述说:
Replaces each substring of this string that matches the given regular expression with the given replacement.
用给定的替换替换此字符串中与给定正则表达式匹配的每个子字符串。
So in other words, even if the regex only finds the first and then stops searching, this function will still find them all.
所以换句话说,即使正则表达式只找到第一个然后停止搜索,这个函数仍然会找到它们。
For instance: String.replaceAll("[\\/:*?\"<>|]","")
例如: String.replaceAll("[\\/:*?\"<>|]","")
However, I don't feel like I can take that risk. So does anybody know how I can extend this?
不过,我觉得我不能冒这个险。那么有人知道我如何扩展它吗?
采纳答案by bobince
Windows filename rules are tricky. You're only scratching the surface.
Windows 文件名规则很棘手。你只是触及了表面。
For example here are some things that are not valid filenames, in addition to the chracters you listed:
例如,除了您列出的字符外,还有一些不是有效文件名的内容:
(yes, that's an empty string)
.
.a
a.
a (that's a leading space)
a (or a trailing space)
com
prn.txt
[anything over 240 characters]
[any control characters]
[any non-ASCII chracters that don't fit in the system codepage,
if the filesystem is FAT32]
Removing special characters in a single regex sub like String.replaceAll() isn't enough; you can easily end up with something invalid like an empty string or trailing ‘.' or ‘ '. Replacing something like “[^A-Za-z0-9_.]*” with ‘_' would be a better first step. But you will still need higher-level processing on whatever platform you're using.
删除单个正则表达式子中的特殊字符,如 String.replaceAll() 是不够的;你很容易得到一些无效的东西,比如空字符串或尾随 '.' 或者 ' '。将“[^A-Za-z0-9_.]*”之类的内容替换为 '_' 将是更好的第一步。但是您仍然需要在您使用的任何平台上进行更高级别的处理。
回答by Kredns
You might try allowing only the stuff you want the user to be able to enter, for example A-Z, a-z, and 0-9.
您可以尝试只允许您希望用户能够输入的内容,例如 AZ、az 和 0-9。
回答by Artelius
For the record, POSIX-compliant systems (including UNIX and Linux) support all characters except the null character ('\0'
) and forwards slash ('/'
) in filenames. Special characters such as space and asterisk must be escaped on the command line so that they do not take their usual roles.
作为记录,符合 POSIX 的系统(包括 UNIX 和 Linux)支持除空字符 ( '\0'
) 和'/'
文件名中的正斜杠 ( )之外的所有字符。必须在命令行中对空格和星号等特殊字符进行转义,以免它们扮演通常的角色。
回答by jpalecek
You cannot do this with a single regexp, because a regexp always matches a substring if the input. Consider the word Alo*h*a
, there is no substring that contains all *
s, and not any other character. So if you can use the replaceAll function, just stick with it.
您不能使用单个正则表达式执行此操作,因为如果输入,正则表达式始终匹配子字符串。考虑单词Alo*h*a
,没有包含所有*
s 的子字符串,也没有包含任何其他字符。因此,如果您可以使用 replaceAll 功能,请坚持使用它。
BTW, the set of forbidden characters is different in other OSes.
顺便说一句,其他操作系统中的禁用字符集是不同的。
回答by Pesto
Java has a replaceAll function, but every programming language has a way to do something similar. Perl, for example, uses the g
switch to signify a global replacement. Python's sub
function allows you to specify the number of replacements to make. If, for some reason, your language didn'thave an equivalent, you can always do something like this:
Java 有一个 replaceAll 函数,但每种编程语言都有做类似事情的方法。例如,Perl 使用g
开关来表示全局替换。Python 的sub
函数允许您指定要进行的替换次数。如果由于某种原因,您的语言没有对应的语言,您可以随时执行以下操作:
while (filename.matches(bad_characters)
filename.replace(bad_characters, "")
回答by Alex_M
since no answer was good enough i did it myself. hope this helps ;)
因为没有足够好的答案,所以我自己做了。希望这可以帮助 ;)
public static boolean validateFileName(String fileName) {
return fileName.matches("^[^.\\/:*?\"<>|]?[^\\/:*?\"<>|]*")
&& getValidFileName(fileName).length()>0;
}
public static String getValidFileName(String fileName) {
String newFileName = fileName.replace("^\.+", "").replaceAll("[\\/:*?\"<>|]", "");
if(newFileName.length()==0)
throw new IllegalStateException(
"File Name " + fileName + " results in a empty fileName!");
return newFileName;
}
回答by Vysakh Prem
I extract all word characters and whitespace characters from the original string and I also make sure that whitespace character is not present at the end of the string. Here is my code snippet in java.
我从原始字符串中提取所有单词字符和空白字符,并且我还确保字符串末尾不存在空白字符。这是我在java中的代码片段。
temp_string = original.replaceAll("[^\w|\s]", "");
final_string = temp_string.replaceAll("\s$", "");
I think I helped someone.
我想我帮助了某人。
回答by Adam111p
I use pure and simple regular expression. I give characters that may occur and through the negation of "^" I change all the other as a sign of such. "_"
我使用纯粹和简单的正则表达式。我给出了可能出现的字符,通过否定“^”,我改变了所有其他字符作为这种标志。“_”
String fileName = someString.replaceAll("[^a-zA-Z0-9\\.\\-]", "_");
String fileName = someString.replaceAll("[^a-zA-Z0-9\\.\\-]", "_");
For example: If you do not want to be in the expression a "." in then remove the "\\."
例如:如果不想在表达式中出现一个“.”。在然后删除“\\。”
String fileName = someString.replaceAll("[^a-zA-Z0-9\\-]", "_");
String fileName = someString.replaceAll("[^a-zA-Z0-9\\-]", "_");
回答by Balaco
Windows also do not accept "%" as a file name.
Windows 也不接受“%”作为文件名。
If you are building a general expression that may affect files that will eventually be moved to other operating system, I suggest that you put more characters that may have problems with them.
如果您正在构建一个可能影响最终将移动到其他操作系统的文件的通用表达式,我建议您放置更多可能有问题的字符。
For example, in Linux (many distributions I know), some users may have problems with files containing [b]& ! ] [ / - ( )[/b]. The symbols are allowed in file names, but they may need to be specially treated by users and some programs have bugs caused by their existence.
例如,在 Linux(我知道的许多发行版)中,一些用户可能会遇到包含 [b]& 的文件的问题!] [ / - ( )[/b]。符号在文件名中是允许的,但它们可能需要用户特殊处理,并且某些程序会因它们的存在而导致错误。
回答by Ivan Aracki
I made one very simple methodthat works for me for most common cases:
我做了一种非常简单的方法,适用于大多数常见情况:
// replace special characters that windows doesn't accept
private String replaceSpecialCharacters(String string) {
return string.replaceAll("[\*/\\!\|:?<>]", "_")
.replaceAll("(%22)", "_");
}
%22is encoded if you have qoute (") in your file names.
如果文件名中有 qoute ( "),则%22被编码。