Windows 中的 UTF-8

Question

提问by Michael Platings

How do I set the code page to UTF-8 in a C Windows program?

如何在 C Windows 程序中将代码页设置为 UTF-8？

I have a third party library that uses fopen to open files. I can use wcstombs to convert my Unicode filenames to the current code page, however if the user has a filename with a character outside the code page then this breaks.

我有一个使用 fopen 打开文件的第三方库。我可以使用 wcstombs 将我的 Unicode 文件名转换为当前代码页，但是如果用户的文件名包含代码页之外的字符，那么这会中断。

Ideally I would just call _setmbcp(65001) to set the code page to UTF-8, however the MSDN documentation for _setmbcp states that UTF-8 is not supported.

理想情况下，我只会调用 _setmbcp(65001) 将代码页设置为 UTF-8，但是 _setmbcp 的 MSDN 文档指出不支持 UTF-8。

How can I get around this?

我怎样才能解决这个问题？

Answer 1

采纳答案by efotinis

Unfortunately, there is no way to make Unicode the current codepage in Windows. The CP_UTF7and CP_UTF8constants are pseudo-codepages, used only in MultiByteToWideCharand WideCharToMultiByteconversion functions, like Ben mentioned.

不幸的是，没有办法使 Unicode 成为 Windows 中的当前代码页。的CP_UTF7和CP_UTF8常数是伪代码页，仅在所用的MultiByteToWideChar和调用WideCharToMultiByte转换功能，像本提及。

Your problem is similar to that of the fstream C++ classes. The fstream constructors accept only char*names, making impossible to open a file with a true Unicode name. The only solution offered by VC was a hack: open the file separately and then set the handle to the stream object. I'm afraid this isn't an option for you, of course, since the third party library probably doesn't accept handles.

您的问题类似于 fstream C++ 类的问题。fstream 构造函数只接受char*名称，因此无法使用真正的 Unicode 名称打开文件。VC 提供的唯一解决方案是 hack：单独打开文件，然后将句柄设置为流对象。当然，恐怕这不是您的选择，因为第三方库可能不接受句柄。

The only solution I can think of is to create a temporary file with a non-Unicode name, which is hard-linked to the original, and use that as a parameter.

我能想到的唯一解决方案是创建一个非 Unicode 名称的临时文件，该文件与原始文件硬链接，并将其用作参数。

Answer 2

回答by Ben Straub

All Windows APIs think in UTF-16, so you're better off writing a wrapper around your library that converts at the boundaries.

所有 Windows API 都采用 UTF-16，因此最好在库周围编写一个包装器，以便在边界处进行转换。

Oddly enough, Windows thinks UTF-8 is a codepage for the purposes of conversion, so you use the same APIs as you would to convert between codepages:

奇怪的是，Windows 认为 UTF-8 是用于转换目的的代码页，因此您使用与在代码页之间转换相同的 API：

std::wstring Utf8ToUtf16(const char* u8string)
{
    int wcharcount = strlen(u8string);
    wchar_t *tempWstr = new wchar_t[wcharcount];
    MultiByteToWideChar(CP_UTF8, 0, u8string, -1, tempWstr, wcharcount);
    wstring w(tempWstr);
    delete [] tempWstr;
    return w;
}

And something of similar form to convert back.

和类似形式的东西转换回来。

Answer 3

回答by Arthur2e5

2018 update: Windows 10 has made the "65001" code page less "pseudo" in two steps:

2018 更新：Windows 10 分两步让“65001”代码页少了“伪”：

conhostchanges: Windows Subsystem for Linux uses code page 65001 for its consoles. It is also possible to run chcp 65001in cmd.exesince WSL. (It has caused some pretty dumb Python bugs.)
full-featured locale: Windows since build 17035 allows setting UTF-8 as the locale codepage. This is available from the April 2018 update.

conhost更改：适用于 Linux 的 Windows 子系统对其控制台使用代码页 65001。它也可以运行chcp 65001在cmd.exe自WSL。（它导致了一些非常愚蠢的 Python 错误。）
全功能的语言环境：自构建 17035 以来的 Windows允许将 UTF-8 设置为语言环境代码页。这可从 2018 年 4 月的更新中获得。

Answer 4

回答by R.. GitHub STOP HELPING ICE

Use cygwin (which provides a UTF-8 locale by default), or write your own libc hack for Windows that does the necessary UTF-8 to UTF-16 translations and wraps the nonstandard _wfopenetc. functions.

使用 cygwin（默认提供 UTF-8 语言环境），或为 Windows 编写自己的 libc hack，它执行必要的 UTF-8 到 UTF-16 转换并包装非标准_wfopen等功能。

Windows 中的 UTF-8

提问by Michael Platings

采纳答案by efotinis

回答by Ben Straub

回答by Arthur2e5

回答by R.. GitHub STOP HELPING ICE

相关推荐

最近更新

标签

Windows 中的 UTF-8

提问by Michael Platings

采纳答案by efotinis

回答by Ben Straub

回答by Arthur2e5

回答by R.. GitHub STOP HELPING ICE

相关推荐

windows 常见的可写应用程序文件放在哪里？

windows 阻止程序使用 c# 启动的方法？

在 C# 中拦截 Windows Vista 关闭事件

windows 如何以编程方式获取 C++ 中的 CPU 缓存页面大小？

相关推荐

最近更新

标签