C# 清理文件名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/309485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# Sanitize File Name
提问by Jason Sundram
I recently have been moving a bunch of MP3s from various locations into a repository. I had been constructing the new file names using the ID3 tags (thanks, TagLib-Sharp!), and I noticed that I was getting a System.NotSupportedException
:
我最近一直在将一堆 MP3 从不同位置移动到存储库中。我一直在使用 ID3 标签构建新文件名(谢谢,TagLib-Sharp!),我注意到我得到了一个System.NotSupportedException
:
"The given path's format is not supported."
“不支持给定路径的格式。”
This was generated by either File.Copy()
or Directory.CreateDirectory()
.
这是由File.Copy()
或生成的Directory.CreateDirectory()
。
It didn't take long to realize that my file names needed to be sanitized. So I did the obvious thing:
没过多久就意识到我的文件名需要清理。所以我做了显而易见的事情:
public static string SanitizePath_(string path, char replaceChar)
{
string dir = Path.GetDirectoryName(path);
foreach (char c in Path.GetInvalidPathChars())
dir = dir.Replace(c, replaceChar);
string name = Path.GetFileName(path);
foreach (char c in Path.GetInvalidFileNameChars())
name = name.Replace(c, replaceChar);
return dir + name;
}
To my surprise, I continued to get exceptions. It turned out that ':' is not in the set of Path.GetInvalidPathChars()
, because it is valid in a path root. I suppose that makes sense - but this has to be a pretty common problem. Does anyone have some short code that sanitizes a path? The most thorough I've come up with this, but it feels like it is probably overkill.
令我惊讶的是,我继续得到例外。原来 ':' 不在 的集合中Path.GetInvalidPathChars()
,因为它在路径根中有效。我想这是有道理的 - 但这必须是一个非常普遍的问题。有没有人有一些清理路径的短代码?我想出的最彻底的,但感觉它可能是矫枉过正。
// replaces invalid characters with replaceChar
public static string SanitizePath(string path, char replaceChar)
{
// construct a list of characters that can't show up in filenames.
// need to do this because ":" is not in InvalidPathChars
if (_BadChars == null)
{
_BadChars = new List<char>(Path.GetInvalidFileNameChars());
_BadChars.AddRange(Path.GetInvalidPathChars());
_BadChars = Utility.GetUnique<char>(_BadChars);
}
// remove root
string root = Path.GetPathRoot(path);
path = path.Remove(0, root.Length);
// split on the directory separator character. Need to do this
// because the separator is not valid in a filename.
List<string> parts = new List<string>(path.Split(new char[]{Path.DirectorySeparatorChar}));
// check each part to make sure it is valid.
for (int i = 0; i < parts.Count; i++)
{
string part = parts[i];
foreach (char c in _BadChars)
{
part = part.Replace(c, replaceChar);
}
parts[i] = part;
}
return root + Utility.Join(parts, Path.DirectorySeparatorChar.ToString());
}
Any improvements to make this function faster and less baroque would be much appreciated.
任何使此功能更快和更少巴洛克风格的改进将不胜感激。
回答by Brian
Your code would be cleaner if you appended the directory and filename together and sanitized that rather than sanitizing them independently. As for sanitizing away the :, just take the 2nd character in the string. If it is equal to "replacechar", replace it with a colon. Since this app is for your own use, such a solution should be perfectly sufficient.
如果您将目录和文件名附加在一起并对其进行清理而不是单独清理它们,则您的代码会更清晰。至于清除 :,只需取字符串中的第二个字符。如果它等于“replacechar”,则将其替换为冒号。由于此应用程序供您自己使用,因此这样的解决方案应该是完全足够的。
回答by Dour High Arch
I think the problem is that you first call Path.GetDirectoryName
on the bad string. If this has non-filename characters in it, .Net can't tell which parts of the string are directories and throws. You have to do string comparisons.
我认为问题在于您首先调用Path.GetDirectoryName
了错误的字符串。如果其中包含非文件名字符,.Net 无法判断字符串的哪些部分是目录和抛出。您必须进行字符串比较。
Assuming it's only the filename that is bad, not the entire path, try this:
假设只是文件名不好,而不是整个路径,试试这个:
public static string SanitizePath(string path, char replaceChar)
{
int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1;
var sb = new System.Text.StringBuilder();
sb.Append(path.Substring(0, filenamePos));
for (int i = filenamePos; i < path.Length; i++)
{
char filenameChar = path[i];
foreach (char c in Path.GetInvalidFileNameChars())
if (filenameChar.Equals(c))
{
filenameChar = replaceChar;
break;
}
sb.Append(filenameChar);
}
return sb.ToString();
}
回答by Andre
To clean up a file name you could do this
要清理文件名,您可以这样做
private static string MakeValidFileName( string name )
{
string invalidChars = System.Text.RegularExpressions.Regex.Escape( new string( System.IO.Path.GetInvalidFileNameChars() ) );
string invalidRegStr = string.Format( @"([{0}]*\.+$)|([{0}]+)", invalidChars );
return System.Text.RegularExpressions.Regex.Replace( name, invalidRegStr, "_" );
}
回答by Ralf
using System;
using System.IO;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
try
{
var badString = "ABC\DEF/GHI<JKL>MNO:PQR\"STU\tVWX|YZA*BCD?EFG";
Console.WriteLine(badString);
Console.WriteLine(SanitizeFileName(badString, '.'));
Console.WriteLine(SanitizeFileName(badString));
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
private static string SanitizeFileName(string fileName, char? replacement = null)
{
if (fileName == null) { return null; }
if (fileName.Length == 0) { return ""; }
var sb = new StringBuilder();
var badChars = Path.GetInvalidFileNameChars().ToList();
foreach (var @char in fileName)
{
if (badChars.Contains(@char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(@char);
}
return sb.ToString();
}
}
回答by Helix 88
I have had success with this in the past.
我过去在这方面取得了成功。
Nice, short and static :-)
不错,简短而静态:-)
public static string returnSafeString(string s)
{
foreach (char character in Path.GetInvalidFileNameChars())
{
s = s.Replace(character.ToString(),string.Empty);
}
foreach (char character in Path.GetInvalidPathChars())
{
s = s.Replace(character.ToString(), string.Empty);
}
return (s);
}
回答by fiat
Based on Andre's excellent answer but taking into account Spud's comment on reserved words, I made this version:
基于安德烈的出色回答,但考虑到 Spud 对保留字的评论,我制作了这个版本:
/// <summary>
/// Strip illegal chars and reserved words from a candidate filename (should not include the directory path)
/// </summary>
/// <remarks>
/// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name
/// </remarks>
public static string CoerceValidFileName(string filename)
{
var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
var invalidReStr = string.Format(@"[{0}]+", invalidChars);
var reservedWords = new []
{
"CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4",
"COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4",
"LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
};
var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_");
foreach (var reservedWord in reservedWords)
{
var reservedWordPattern = string.Format("^{0}\.", reservedWord);
sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase);
}
return sanitisedNamePart;
}
And these are my unit tests
这些是我的单元测试
[Test]
public void CoerceValidFileName_SimpleValid()
{
var filename = @"thisIsValid.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual(filename, result);
}
[Test]
public void CoerceValidFileName_SimpleInvalid()
{
var filename = @"thisIsNotValid\_3.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid_3__3.txt", result);
}
[Test]
public void CoerceValidFileName_InvalidExtension()
{
var filename = @"thisIsNotValid.t\xt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid.t_xt", result);
}
[Test]
public void CoerceValidFileName_KeywordInvalid()
{
var filename = "aUx.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("_reservedWord_.txt", result);
}
[Test]
public void CoerceValidFileName_KeywordValid()
{
var filename = "auxillary.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("auxillary.txt", result);
}
回答by DenNukem
A shorter solution:
更短的解决方案:
var invalids = System.IO.Path.GetInvalidFileNameChars();
var newName = String.Join("_", origFileName.Split(invalids, StringSplitOptions.RemoveEmptyEntries) ).TrimEnd('.');
回答by data
string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars()));
回答by André Leal
I'm using the System.IO.Path.GetInvalidFileNameChars()
method to check invalid characters and I've got no problems.
我正在使用该System.IO.Path.GetInvalidFileNameChars()
方法检查无效字符,我没有遇到任何问题。
I'm using the following code:
我正在使用以下代码:
foreach( char invalidchar in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(invalidchar, '_');
}
回答by Valamas
I wanted to retain the characters in some way, not just simply replace the character with an underscore.
我想以某种方式保留字符,而不仅仅是用下划线替换字符。
One way I thought was to replace the characters with similar looking characters which are (in my situation), unlikely to be used as regular characters. So I took the list of invalid characters and found look-a-likes.
我认为的一种方法是用外观相似的字符替换字符,这些字符(在我的情况下)不太可能用作常规字符。所以我拿了无效字符的列表并找到了相似的字符。
The following are functions to encode and decode with the look-a-likes.
以下是使用look-a-likes进行编码和解码的函数。
This code does not include a complete listing for all System.IO.Path.GetInvalidFileNameChars() characters. So it is up to you to extend or utilize the underscore replacement for any remaining characters.
此代码不包括所有 System.IO.Path.GetInvalidFileNameChars() 字符的完整列表。因此,您可以为任何剩余字符扩展或使用下划线替换。
private static Dictionary<string, string> EncodeMapping()
{
//-- Following characters are invalid for windows file and folder names.
//-- \/:*?"<>|
Dictionary<string, string> dic = new Dictionary<string, string>();
dic.Add(@"\", "ì"); // U+OOCC
dic.Add("/", "í"); // U+OOCD
dic.Add(":", "|"); // U+00A6
dic.Add("*", "¤"); // U+00A4
dic.Add("?", "?"); // U+00BF
dic.Add(@"""", "?"); // U+02EE
dic.Add("<", "?"); // U+00AB
dic.Add(">", "?"); // U+00BB
dic.Add("|", "│"); // U+2502
return dic;
}
public static string Escape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Key, replace.Value);
}
//-- handle dot at the end
if (name.EndsWith(".")) name = name.CropRight(1) + "°";
return name;
}
public static string UnEscape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Value, replace.Key);
}
//-- handle dot at the end
if (name.EndsWith("°")) name = name.CropRight(1) + ".";
return name;
}
You can select your own look-a-likes. I used the Character Map app in windows to select mine %windir%\system32\charmap.exe
您可以选择自己的喜好。我使用 Windows 中的字符映射应用程序来选择我的%windir%\system32\charmap.exe
As I make adjustments through discovery, I will update this code.
当我通过发现进行调整时,我将更新此代码。