如何在 C++ 中使用 utf8 字符数组？

Question

提问by sekmet64

Is it possible to have char *s to work with utf8 encoding in C++ (VC2010)?

是否可以让char *s 在 C++ (VC2010) 中使用 utf8 编码？

For example if my source file is saved in utf8 and I write something like this:

例如，如果我的源文件保存在 utf8 中，并且我编写如下内容：

const char* c = "a?áé??";

Is this possible to make it utf-8 encoded? And if yes, how is it possible to use

这有可能使它编码为 utf-8 吗？如果是，如何使用

char* c2 = new char[strlen("a?áé??")];

for dynamic allocation if characters can be variable length?

用于动态分配，如果字符可以是可变长度？

Answer 1

采纳答案by James Kanze

The encoding for narrow character string literals is implementation defined, so you'd really have to read the documentation (if you can find it). A quick experiment shows that both VC++ (VC8, anyway) and g++ (4.4.2, anyway) actually just copy the bytes from the source file; the string literal will be in whatever encoding your editor saved it in. (This is clearly in violation of the standard, but it seems to be common practice.)

窄字符串文字的编码是实现定义的，所以你真的必须阅读文档（如果你能找到的话）。一个快速的实验表明 VC++（无论如何是 VC8）和 g++（无论如何是 4.4.2）实际上只是从源文件中复制字节；字符串文字将采用您的编辑器保存它的任何编码。（这显然违反了标准，但这似乎是常见的做法。）

C++11 has UTF-8 string literals, which would allow you to write u8"text", and be ensured that "text"was encoded in UTF-8. But I don't really expect it to work reliably: the problem is that in order to do this, the compiler has to know what encoding your source file has. In all probability, compiler writers will continue to ignore the issue, just copying the bytes from the source file, and achieve conformance simply be documenting that the source file must be in UTF-8 for these features to work.

C++11 具有 UTF-8 字符串文字，这将允许您编写u8"text"，并确保"text"以 UTF-8 编码。但我并不真正期望它可靠地工作：问题是为了做到这一点，编译器必须知道您的源文件具有什么编码。很可能，编译器编写者将继续忽略这个问题，只是从源文件中复制字节，并通过记录源文件必须是 UTF-8 格式才能使这些功能正常工作来实现一致性。

Answer 2

回答by Klaim

If the text you want to put in the string is in your source code, make sure your source code file is in UTF-8.

如果您想放入字符串中的文本在您的源代码中，请确保您的源代码文件是 UTF-8。

If that don't work, try maybe using \u1234with 1234 being a code point value.

如果这不起作用，请尝试使用\u12341234 作为代码点值。

You can also try to use UTF8-CPPmaybe.

您也可以尝试使用UTF8-CPP。

Take a look at this answer : Using Unicode in C++ source code

看看这个答案：Using Unicode in C++ source code

Answer 3

回答by vladasimovic

It is possible, save the file in UTF-8 without BOMsignature encoding.

可以将文件保存为 UTF-8而不使用 BOM签名编码。

//Save As UTF8 without BOM signature
#include<stdio.h>
#include<windows.h>
int main(){
    SetConsoleOutputCP(65001);
    char *c1 = "a?áé??";
    char *c2 = new char[strlen("a?áé??")];
    strcpy(c2,c1);
    printf("%s\n",c1);
    printf("%s\n",c2);
}

Result:

结果：

 D:\Debug>program
a?áé??
a?áé??

The result of redirection program is really UTF8 encoded file.
UTF8 file
This is compiler - independent answer (compile on Windows).
(A similar question.)

重定向程序的结果是真正的UTF8编码文件。
UTF8 文件
这是编译器独立的答案（在 Windows 上编译）。
（一个类似的问题。）

Answer 4

回答by yasouser

See this MSDN article which talks about converting between string types (that should give you examples on how to use them). The strings types that are covered include char *, wchar_t*, _bstr_t, CComBSTR, CString, basic_string, and System.String:

请参阅此 MSDN 文章，其中讨论了字符串类型之间的转换（应该为您提供有关如何使用它们的示例）。涵盖的字符串类型包括 char *、wchar_t*、_bstr_t、CComBSTR、CString、basic_string 和 System.String：

How to: Convert Between Various String Types

如何：在各种字符串类型之间转换

Answer 5

回答by Zoner

There is a hotfix for VisualStudio 2010 SP1 which can help: http://support.microsoft.com/kb/980263.

VisualStudio 2010 SP1 的修补程序可以提供帮助：http: //support.microsoft.com/kb/980263。

The hotfix adds a pragma to override visual studio's control the character encoding for the char type:

修补程序添加了一个编译指示来覆盖 Visual Studio 对 char 类型的字符编码的控制：

#pragma execution_character_set("utf-8")

Without the pragma, char* based literals are typically interpreted as the default code page (typically 1252)

如果没有 pragma，基于 char* 的文字通常被解释为默认代码页（通常为 1252）

This should all be superseded eventually by new string literal prefix modifiers specified by C++0x (u8, u, and U for utf-8, utf-16, and utf-32 respectively), which ideally will be supprted in the next major version of Visual Studio after 2010.

这最终都应该被 C++0x 指定的新字符串字面量前缀修饰符（分别为 utf-8、utf-16 和 utf-32 的 u8、u 和 U）取代，理想情况下将在下一个主要版本中支持2010 年之后的 Visual Studio 版本。

如何在 C++ 中使用 utf8 字符数组？

提问by sekmet64

采纳答案by James Kanze

回答by Klaim

回答by vladasimovic

回答by yasouser

回答by Zoner

相关推荐

最近更新

标签

如何在 C++ 中使用 utf8 字符数组？

提问by sekmet64

采纳答案by James Kanze

回答by Klaim

回答by vladasimovic

回答by yasouser

回答by Zoner

相关推荐

C++ '(' 标记之前的预期构造函数、析构函数或类型转换

C++ double 与整数精度的乘法

C++ 如何将 boost::optional 设置回未初始化状态？

C++ 如何将字符数组转换为字符串？

相关推荐

最近更新

标签