C# 如何从路径和文件名中删除非法字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/146134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 15:33:14  来源:igfitidea点击:

How to remove illegal characters from path and filenames?

提问by Gary Willoughby

I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?

我需要一种强大而简单的方法来从简单字符串中删除非法路径和文件字符。我使用了下面的代码,但它似乎没有做任何事情,我错过了什么?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\a/ry/ h**ad:>> a\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}

采纳答案by Matthew Scharley

Try something like this instead;

试试这样的事情;

string illegal = "\"M\"\a/ry/ h**ad:>> a\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

但是我必须同意这些评论,我可能会尝试处理非法路径的来源,而不是尝试将非法路径破坏为合法但可能是无意的路径。

Edit: Or a potentially 'better' solution, using Regex's.

编辑:或者一个潜在的“更好”的解决方案,使用正则表达式。

string illegal = "\"M\"\a/ry/ h**ad:>> a\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

尽管如此,还是要问一个问题,首先为什么要这样做。

回答by user7116

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

对于初学者来说,Trim 只从字符串的开头或结尾删除字符。其次,您应该评估您是否真的想删除令人反感的字符,或者快速失败并让用户知道他们的文件名无效。我的选择是后者,但我的回答至少应该告诉你如何以正确和错误的方式做事:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

StackOverflow 问题显示如何检查给定字符串是否为有效文件名。请注意,您可以使用此问题中的正则表达式来删除带有正则表达式替换的字符(如果您确实需要这样做)。

回答by Sandor Davidhazi

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspxhttp://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

我认为使用正则表达式进行验证并指定允许使用哪些字符要容易得多,而不是尝试检查所有坏字符。请参阅这些链接:http: //www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.

另外,搜索“正则表达式编辑器”,它们有很大帮助。有一些甚至可以为您输出c#中的代码。

回答by Jeff Yates

I use regular expressions to achieve this. First, I dynamically build the regex.

我使用正则表达式来实现这一点。首先,我动态构建正则表达式。

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

然后我只是调用 removeInvalidChars.Replace 来进行查找和替换。这显然也可以扩展到覆盖路径字符。

回答by mirezus

Throw an exception.

抛出异常。

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }

回答by James

Here's a code snippet that should help for .NET 3 and higher.

这是一个应该对 .NET 3 及更高版本有所帮助的代码片段。

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}

回答by Gregor Slavec

You can remove illegal chars using Linq like this:

您可以使用 Linq 删除非法字符,如下所示:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

EDIT
This is how it looks with the required edit mentioned in the comments:

编辑
这是评论中提到的所需编辑的外观:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());

回答by Jan

I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

我绝对喜欢杰夫耶茨的想法。如果您稍微修改一下,它将完美运行:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

The improvement is just to escape the automaticially generated regex.

改进只是为了逃避自动生成的正则表达式。

回答by Michael Minton

I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.

我使用 Linq 来清理文件名。您也可以轻松扩展它以检查有效路径。

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Update

更新

Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.

一些评论表明此方法对他们不起作用,因此我提供了一个指向 DotNetFiddle 片段的链接,以便您可以验证该方法。

https://dotnetfiddle.net/nw1SWY

https://dotnetfiddle.net/nw1SWY

回答by René

These are all great solutions, but they all rely on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

这些都是很好的解决方案,但它们都依赖于Path.GetInvalidFileNameChars,这可能不像您想象的那么可靠。请注意 MSDN 文档中的以下注释Path.GetInvalidFileNameChars

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names.The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

不保证从此方法返回的数组包含在文件和目录名称中无效的完整字符集。全套无效字符可能因文件系统而异。例如,在基于 Windows 的桌面平台上,无效路径字符可能包括 ASCII/Unicode 字符 1 到 31,以及引号 (")、小于 (<)、大于 (>)、管道 (|)、退格 ( \b)、空值 (\0) 和制表符 (\t)。

It's not any better with Path.GetInvalidPathCharsmethod. It contains the exact same remark.

Path.GetInvalidPathChars方法也好不到哪里去。它包含完全相同的注释。