.net 如何检查路径中的非法字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2435894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 14:04:09  来源:igfitidea点击:

How do I check for illegal characters in a path?

.netpathillegal-characters

提问by Mike Pateras

Is there a way to check if a String meant for a path has invalid characters, in .Net? I know I could iterate over each character in Path.InvalidPathChars to see if my String contained one, but I'd prefer a simple, perhaps more formal, solution.

有没有办法在 .Net 中检查用于路径的字符串是否包含无效字符?我知道我可以遍历 Path.InvalidPathChars 中的每个字符以查看我的 String 是否包含一个,但我更喜欢一个简单的,也许更正式的解决方案。

Is there one?

有吗?

I've found I still get an exception if I only check against Get

我发现如果我只检查 Get ,我仍然会遇到异常

Update:

更新:

I've found GetInvalidPathChars does not cover every invalid path character. GetInvalidFileNameChars has 5 more, including '?', which I've come across. I'm going to switch to that, and I'll report back if it, too, proves to be inadequate.

我发现 GetInvalidPathChars 没有涵盖每个无效的路径字符。GetInvalidFileNameChars 还有 5 个,包括我遇到的“?”。我将切换到那个,如果它也被证明是不够的,我会报告。

Update 2:

更新 2:

GetInvalidFileNameChars is definitely not what I want. It contains ':', which any absolute path is going to contain ("C:\whatever"). I think I'm just going to have to use GetInvalidPathChars after all, and add in '?' and any other characters that cause me problems as they come up. Better solutions welcome.

GetInvalidFileNameChars 绝对不是我想要的。它包含':',任何绝对路径都将包含它(“C:\whatever”)。我想我终究只需要使用 GetInvalidPathChars,然后添加 '?' 以及任何其他在出现时给我带来问题的角色。欢迎更好的解决方案。

回答by Jeremy Bell

InvalidPathChars is deprecated. Use GetInvalidPathChars() instead:

不推荐使用 InvalidPathChars。使用 GetInvalidPathChars() 代替:

    public static bool FilePathHasInvalidChars(string path)
    {

        return (!string.IsNullOrEmpty(path) && path.IndexOfAny(System.IO.Path.GetInvalidPathChars()) >= 0);
    }

Edit: Slightly longer, but handles path vs file invalid chars in one function:

编辑:稍长,但在一个函数中处理路径与文件无效字符:

    // WARNING: Not tested
    public static bool FilePathHasInvalidChars(string path)
    {
        bool ret = false;
        if(!string.IsNullOrEmpty(path))
        {
            try
            {
                // Careful!
                //    Path.GetDirectoryName("C:\Directory\SubDirectory")
                //    returns "C:\Directory", which may not be what you want in
                //    this case. You may need to explicitly add a trailing \
                //    if path is a directory and not a file path. As written, 
                //    this function just assumes path is a file path.
                string fileName = System.IO.Path.GetFileName(path);
                string fileDirectory = System.IO.Path.GetDirectoryName(path);

                // we don't need to do anything else,
                                    // if we got here without throwing an 
                                    // exception, then the path does not
                                    // contain invalid characters
            }
            catch (ArgumentException)
            {
                                    // Path functions will throw this 
                                    // if path contains invalid chars
                ret = true;
            }
        }
        return ret;
    }

回答by René

Be careful when relying on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

依赖 时要小心Path.GetInvalidFileNameChars,它可能不像你想象的那么可靠。请注意 MSDN 文档中的以下注释Path.GetInvalidFileNameChars

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names.The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

不保证从此方法返回的数组包含在文件和目录名称中无效的完整字符集。全套无效字符可能因文件系统而异。例如,在基于 Windows 的桌面平台上,无效的路径字符可能包括 ASCII/Unicode 字符 1 到 31,以及引号 (")、小于 (<)、大于 (>)、管道 (|)、退格 ( \b)、空值 (\0) 和制表符 (\t)。

It's not any better with Path.GetInvalidPathCharsmethod. It contains the exact same remark.

Path.GetInvalidPathChars方法也好不到哪里去。它包含完全相同的注释。

回答by Glenn Slayden

As of .NET 4.7.2, Path.GetInvalidFileNameChars()reports the following 41 'bad' characters.

.NET 4.7.2 开始Path.GetInvalidFileNameChars()报告以下 41 个“坏”字符。

0x0000    0      '
public static bool IsInvalidFileNameChar(Char c) => c < 64U ?
        (1UL << c & 0xD4008404FFFFFFFFUL) != 0 :
        c == '\' || c == '|';
' | 0x000d 13 '\r' | 0x001b 27 '\u001b' 0x0001 1 '\u0001' | 0x000e 14 '\u000e' | 0x001c 28 '\u001c' 0x0002 2 '\u0002' | 0x000f 15 '\u000f' | 0x001d 29 '\u001d' 0x0003 3 '\u0003' | 0x0010 16 '\u0010' | 0x001e 30 '\u001e' 0x0004 4 '\u0004' | 0x0011 17 '\u0011' | 0x001f 31 '\u001f' 0x0005 5 '\u0005' | 0x0012 18 '\u0012' | 0x0022 34 '"' 0x0006 6 '\u0006' | 0x0013 19 '\u0013' | 0x002a 42 '*' 0x0007 7 '\a' | 0x0014 20 '\u0014' | 0x002f 47 '/' 0x0008 8 '\b' | 0x0015 21 '\u0015' | 0x003a 58 ':' 0x0009 9 '\t' | 0x0016 22 '\u0016' | 0x003c 60 '<' 0x000a 10 '\n' | 0x0017 23 '\u0017' | 0x003e 62 '>' 0x000b 11 '\v' | 0x0018 24 '\u0018' | 0x003f 63 '?' 0x000c 12 '\f' | 0x0019 25 '\u0019' | 0x005c 92 '\' | 0x001a 26 '\u001a' | 0x007c 124 '|'

As notedby another poster, this is a proper supersetof the set of characters returned by Path.GetInvalidPathChars().

正如指出的另一个海报,这是一个真超集合通过返回的字符的Path.GetInvalidPathChars()

The following function detects the exact set of 41 characters shown above:

以下函数检测上面显示的 41 个字符的确切集合:

public static string RemoveSpecialCharactersUsingCustomMethod(this string expression, bool removeSpecialLettersHavingASign = true)
{
    var newCharacterWithSpace = " ";
    var newCharacter = "";

    // Return carriage handling
    // ASCII LINE-FEED character (LF),
    expression = expression.Replace("\n", newCharacterWithSpace);
    // ASCII CARRIAGE-RETURN character (CR) 
    expression = expression.Replace("\r", newCharacterWithSpace);

    // less than : used to redirect input, allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"<", newCharacter);
    // greater than : used to redirect output, allowed in Unix filenames, see Note 1
    expression = expression.Replace(@">", newCharacter);
    // colon: used to determine the mount point / drive on Windows; 
    // used to determine the virtual device or physical device such as a drive on AmigaOS, RT-11 and VMS; 
    // used as a pathname separator in classic Mac OS. Doubled after a name on VMS, 
    // indicates the DECnet nodename (equivalent to a NetBIOS (Windows networking) hostname preceded by "\".). 
    // Colon is also used in Windows to separate an alternative data stream from the main file.
    expression = expression.Replace(@":", newCharacter);
    // quote : used to mark beginning and end of filenames containing spaces in Windows, see Note 1
    expression = expression.Replace(@"""", newCharacter);
    // slash : used as a path name component separator in Unix-like, Windows, and Amiga systems. 
    // (The MS-DOS command.com shell would consume it as a switch character, but Windows itself always accepts it as a separator.[16][vague])
    expression = expression.Replace(@"/", newCharacter);
    // backslash : Also used as a path name component separator in MS-DOS, OS/2 and Windows (where there are few differences between slash and backslash); allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"\", newCharacter);
    // vertical bar or pipe : designates software pipelining in Unix and Windows; allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"|", newCharacter);
    // question mark : used as a wildcard in Unix, Windows and AmigaOS; marks a single character. Allowed in Unix filenames, see Note 1
    expression = expression.Replace(@"?", newCharacter);
    expression = expression.Replace(@"!", newCharacter);
    // asterisk or star : used as a wildcard in Unix, MS-DOS, RT-11, VMS and Windows. Marks any sequence of characters 
    // (Unix, Windows, later versions of MS-DOS) or any sequence of characters in either the basename or extension 
    // (thus "*.*" in early versions of MS-DOS means "all files". Allowed in Unix filenames, see note 1
    expression = expression.Replace(@"*", newCharacter);
    // percent : used as a wildcard in RT-11; marks a single character.
    expression = expression.Replace(@"%", newCharacter);
    // period or dot : allowed but the last occurrence will be interpreted to be the extension separator in VMS, MS-DOS and Windows. 
    // In other OSes, usually considered as part of the filename, and more than one period (full stop) may be allowed. 
    // In Unix, a leading period means the file or folder is normally hidden.
    expression = expression.Replace(@".", newCharacter);
    // space : allowed (apart MS-DOS) but the space is also used as a parameter separator in command line applications. 
    // This can be solved by quoting, but typing quotes around the name every time is inconvenient.
    //expression = expression.Replace(@"%", " ");
    expression = expression.Replace(@"  ", newCharacter);

    if (removeSpecialLettersHavingASign)
    {
        // Because then issues to zip
        // More at : http://www.thesauruslex.com/typo/eng/enghtml.htm
        expression = expression.Replace(@"ê", "e");
        expression = expression.Replace(@"?", "e");
        expression = expression.Replace(@"?", "i");
        expression = expression.Replace(@"?", "oe");
    }

    return expression;
}

回答by Andrew

It's probably too late for you, but may help somebody else. I faced the same issue and needed to find a reliable way to sanitize a path.

对你来说可能为时已晚,但可能会帮助其他人。我遇到了同样的问题,需要找到一种可靠的方法来清理路径。

Here is what I ended up using, in 3 steps:

这是我最终使用的内容,分 3 个步骤:

Step 1: Custom cleaning.

第 1 步:自定义清洁。

public static bool ContainsAnyInvalidCharacters(this string path)
{
    return (!string.IsNullOrEmpty(path) && path.IndexOfAny(Path.GetInvalidPathChars()) >= 0);
}

Step 2: Check any invalid characters not yet removed.

第 2 步:检查尚未删除的任何无效字符。

A an extra verification step, I use the Path.GetInvalidPathChars()method posted above to detect any potential invalid characters not yet removed.

一个额外的验证步骤,我使用Path.GetInvalidPathChars()上面发布的方法来检测任何尚未删除的潜在无效字符。

public static string RemoveSpecialCharactersUsingFrameworkMethod(this string path)
{
    return Path.GetInvalidFileNameChars().Aggregate(path, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Step 3: Clean any special characters detected in Step 2.

步骤 3:清除步骤 2 中检测到的所有特殊字符。

And finally, I use this method as final step to clean anything left. (from How to remove illegal characters from path and filenames?):

最后,我使用这个方法作为最后一步来清理剩下的东西。(来自如何从路径和文件名中删除非法字符?):

/// <summary>Determines if the path contains invalid characters.</summary>
/// <remarks>This method is intended to prevent ArgumentException's from being thrown when creating a new FileInfo on a file path with invalid characters.</remarks>
/// <param name="filePath">File path.</param>
/// <returns>True if file path contains invalid characters.</returns>
private static bool ContainsInvalidPathCharacters(string filePath)
{
    for (var i = 0; i < filePath.Length; i++)
    {
        int c = filePath[i];

        if (c == '\"' || c == '<' || c == '>' || c == '|' || c == '*' || c == '?' || c < 32)
            return true;
    }

    return false;
}

I log any invalid character not cleaned in the first step. I choose to go that way to improve my custom method as soon as a 'leak' is detected. I can't rely on the Path.GetInvalidFileNameChars()because of the following statement a reported above (from MSDN):

我记录了第一步中未清除的任何无效字符。一旦检测到“泄漏”,我就选择采用这种方式来改进我的自定义方法。Path.GetInvalidFileNameChars()由于上面报告的以下声明(来自 MSDN),我不能依赖:

"The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. "

“此方法返回的数组不能保证包含在文件和目录名称中无效的完整字符集。”

It may not be the ideal solution, but given the context of my application and the level of reliability required, this is the best solution I found.

它可能不是理想的解决方案,但考虑到我的应用程序的上下文和所需的可靠性级别,这是我找到的最佳解决方案。

回答by Randy Burden

I ended up borrowing and combining a few internal .NET implementations to come up with a performant method:

我最终借用并结合了一些内部 .NET 实现来提出一种高性能方法:

if ( !string.IsNullOrWhiteSpace(path) && !ContainsInvalidPathCharacters(path))
{
    FileInfo fileInfo = null;

    try
    {
        fileInfo = new FileInfo(path);
    }
    catch (ArgumentException)
    {            
    }

    ...
}

I then used it like so but also wrapped it up in a try/catch block for safety:

然后我像这样使用它,但为了安全起见,我也将它包裹在一个 try/catch 块中:

private static HashSet<char> _invalidCharacters = new HashSet<char>(Path.GetInvalidPathChars());

回答by John

I recommend using a HashSetfor this to increase efficiency:

我建议使用 aHashSet来提高效率:

public static bool IsPathValid(string filePath)
{
    return !string.IsNullOrEmpty(filePath) && !filePath.Any(pc => _invalidCharacters.Contains(pc));
}

Then you can simply check that the string isn't null/empty and that there aren't any invalid characters:

然后您可以简单地检查字符串是否为空/空并且没有任何无效字符:

private static bool CheckInvalidPath(string targetDir)
{
  string root;
  try
  {
    root = Path.GetPathRoot(targetDir);
  }
  catch
  {
    // the path is definitely invalid if it has crashed
    return false;
  }

  // of course it is better to cache it as it creates
  // new array on each call
  char[] chars = Path.GetInvalidFileNameChars();

  // ignore root
  for (int i = root.Length; i < targetDir.Length; i++)
  {
    char c = targetDir[i];

    // separators are allowed
    if (c == Path.DirectorySeparatorChar || c == Path.AltDirectorySeparatorChar)
      continue;

    // check for illegal chars
    for (int j = 0; j < chars.Length; j++)
      if (c == chars[j])
        return false;
  }

  return true;
}

Try it online

网上试试

回答by rattler

I'm also too late. But if the task is to validate if user entered something valid as path, there is a combined solution for paths.

我也来晚了。但是,如果任务是验证用户是否输入了有效的路径,则存在路径的组合解决方案。

Path.GetInvalidFileNameChars()returns list of characters illegal for file, but the directory follows the file's rules except the separators (which we could get from system) and the root specifier (C:, we can just remove it from search). Yes, Path.GetInvalidFileNameChars()returns not the complete set, but it is better than try to find all of them manually.

Path.GetInvalidFileNameChars()返回文件的非法字符列表,但目录遵循文件的规则,除了分隔符(我们可以从系统中获取)和根说明符(C:,我们可以从搜索中删除它)。是的,Path.GetInvalidFileNameChars()返回的不是完整的集合,但比尝试手动查找所有集合要好。

So:

所以:

##代码##

I've found that methods like Path.GetFileNamewill not crash for paths like C:\*(which is completely invalid) and even exception-based check is not enough. The only thing which will crash the Path.GetPathRootis invalid root (like CC:\someDir). So everything other should be done manually.

我发现像这样的方法Path.GetFileName不会崩溃C:\*(这是完全无效的),甚至基于异常的检查也是不够的。唯一会崩溃的Path.GetPathRoot是无效的根(如CC:\someDir)。所以其他一切都应该手动完成。