windows NTFS 中的文件名以什么编码存储?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2050973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What encoding are filenames in NTFS stored as?
提问by vroooom
I'm just getting started on some programming to handle filenames with non-english names on a WinXP system. I've done some recommended reading on unicode and I think I get the basic idea, but some parts are still not very clear to me.
我刚刚开始进行一些编程,以在 WinXP 系统上处理具有非英文名称的文件名。我已经做了一些关于 unicode 的推荐阅读,我想我明白了基本的想法,但有些部分对我来说仍然不是很清楚。
Specifically, what encoding (UTF-8, UTF-16LE/BE) are the file names(not the content, but the actual name of the file) stored in NTFS? Is it possible to open any file using fopen(), which takes a char*, or do I have no choice but to use wfopen(), which uses a wchar_t*, and presumably takes a UTF-16 string?
具体来说,NTFS 中存储的文件名(不是内容,而是文件的实际名称)是什么编码(UTF-8、UTF-16LE/BE)?是否可以使用 fopen() 打开任何文件,它需要一个 char*,或者我别无选择,只能使用 wfopen(),它使用 wchar_t*,并且大概需要一个 UTF-16 字符串?
I tried manually feeding in a UTF-8 encoded string to fopen(), eg.
我尝试将 UTF-8 编码的字符串手动输入到 fopen(),例如。
unsigned char filename[] = {0xEA, 0xB0, 0x80, 0x2E, 0x74, 0x78, 0x74, 0x0}; // ?.txt
FILE* f = fopen((char*)filename, "wb+");
but this came out as 'ê°.txt'.
但结果是'ê°.txt'。
I was under the impression (which may be wrong) that a UTF8-encoded string would suffice in opening any filename under Windows, because I seem to vaguely remember some Windows application passing around (char*), not (wchar_t*), and having no problems.
我的印象是(这可能是错误的)UTF8 编码的字符串足以在 Windows 下打开任何文件名,因为我似乎模糊地记得一些 Windows 应用程序传递 (char*),而不是 (wchar_t*),并且有没问题。
Can anyone shed some light on this?
任何人都可以对此有所了解吗?
采纳答案by villintehaspam
NTFS stores filenames in UTF-16, however fopen
is using ANSI (not UTF-8).
NTFS 以 UTF-16 格式存储文件名,但fopen
使用的是 ANSI(而非 UTF-8)。
In order to use an UTF16-encoded file name you will need to use the Unicode versions of the file open calls. Do this by defining UNICODE
and _UNICODE
in your project. Then use the CreateFile
call or the wfopen
call.
为了使用 UTF16 编码的文件名,您需要使用文件打开调用的 Unicode 版本。通过在您的项目中定义UNICODE
和_UNICODE
来做到这一点。然后使用CreateFile
调用或wfopen
调用。
回答by Chris Becke
fopen() - in MSVC on windows does not (by default) take a utf-8 encoded char*.
fopen() - 在 Windows 上的 MSVC 中(默认情况下)不采用 utf-8 编码的 char*。
Unfortunately utf-8 was invented rather recently in the great scheme of things. Windows APIs are divided into Unicode and Ansi versions. everywindows api that takes or deals with strings is actually available with a W or A suffix - W for "Wide" character/Unicode and A for Ansi. Macro magic hides all this away from the developer so you just call CreateFile with either a char* or a wchar_t* depending on your build configuration without knowing the difference.
不幸的是,utf-8 是最近在伟大的计划中发明的。Windows API 分为 Unicode 和 Ansi 版本。每个接受或处理字符串的 Windows api 实际上都带有 W 或 A 后缀 - W 表示“宽”字符/Unicode,A 表示 Ansi。宏魔法对开发人员隐藏了所有这些,因此您只需根据您的构建配置使用 char* 或 wchar_t* 调用 CreateFile,而无需知道它们之间的区别。
The 'Ansi' encoding is actually not a specific encoding:- But means that the encoding used for "char" strings is specific to the locale setting of the PC.
'Ansi' 编码实际上不是特定的编码:- 但是意味着用于“char”字符串的编码特定于 PC 的区域设置。
Now, because c-runtime functions - like fopen - need to work by default without developer knowledge - on windows systems they expect to receive their strings in the windows local encoding. msdn indicates the microsoft c-runtime api setlocal can change the locale of the current thread - but specifically says that it will fail for any locales that need more than 2 bytes per character - like utf-8.
现在,因为 c 运行时函数 - 像 fopen - 需要在没有开发人员知识的情况下默认工作 - 在 Windows 系统上,他们希望以 Windows 本地编码接收字符串。msdn 表示 microsoft c-runtime api setlocal 可以更改当前线程的区域设置 - 但特别指出它对于每个字符需要超过 2 个字节的任何区域设置都会失败 - 例如 utf-8。
So, on Windows there is no shortcut. You needto use wfopen, or the native API CreateFileW (or create your project using the Unicode build settings and just call Createfile) with wchar_t* strings.
所以,在 Windows 上没有捷径。您需要使用 wfopen 或带有 wchar_t* 字符串的本机 API CreateFileW(或使用 Unicode 构建设置创建项目并调用 Createfile)。
回答by user4815162342
As answered by others, the best way to handle UTF-8-encoded strings is to convert them to UTF-16 and use native Unicode APIs such as _wfopen
or CreateFileW
.
正如其他人所回答的那样,处理 UTF-8 编码字符串的最佳方法是将它们转换为 UTF-16 并使用本机 Unicode API,例如_wfopen
或CreateFileW
。
However, this approach won't help when calling into libraries that use fopen()
unconditionally because they do not support Unicode or because they are written in portable C. In that case it is still possible to make use of the legacy "short paths" to convert a UTF-8-encoded string into an ASCII form usable with fopen
, but it requires some legwork:
但是,这种方法在调用fopen()
无条件使用的库时无济于事,因为它们不支持 Unicode 或因为它们是用可移植的 C 编写的。在这种情况下,仍然可以利用遗留的“短路径”来转换将 UTF-8 编码的字符串转换为 ASCII 格式,可用于fopen
,但这需要一些繁琐的工作:
Convert the UTF-8 representation to UTF-16 using
MultiByteToWideChar
.Use
GetShortPathNameW
to obtain a "short path" which is ASCII-only.GetShortPathNameW
will return it as a wide string with all-ASCII content, which you will need to trivially convert it to a narrow string by a lossless copy casting eachwchar_t
char
.Pass the short path to
fopen()
or to the code that will eventually usefopen()
. Be aware that error messages printed by that code, if any, will refer to the unsightly "short path" (e.g.KINTO~1
instead ofkinto-un-筋斗雲
).
使用 将 UTF-8 表示转换为 UTF-16
MultiByteToWideChar
。使用
GetShortPathNameW
获得“短路径”,这是ASCII只。GetShortPathNameW
将其作为包含全 ASCII 内容的宽字符串返回,您需要通过无损复制将每个wchar_t
char
.将短路径传递
fopen()
给最终将使用的代码fopen()
。请注意,该代码打印的错误消息(如果有)将引用难看的“短路径”(例如,KINTO~1
而不是kinto-un-筋斗雲
)。
While this is not exactly a recommended long-term strategy, as Windows short paths are a legacy feature that can be turned off per-volume, it is likely the only way to pass file names to code that uses fopen()
and other file-related API calls (stat
, access
, ANSI versions of CreateFile
and similar).
虽然这并不是推荐的长期策略,因为 Windows 短路径是可以按卷关闭的遗留功能,它可能是将文件名传递给使用的代码fopen()
和其他与文件相关的 API 调用的唯一方法( stat
, access
, ANSI 版本CreateFile
和类似版本)。