C++ 中的 Unicode 处理

Question

提问by Fortepianissimo

What is the best practice of Unicode processing in C++?

C++ 中 Unicode 处理的最佳实践是什么？

Answer 1

采纳答案by hazzen

Use ICUfor dealing with your data (or a similar library)
In your own data store, make sure everything is stored in the same encoding
Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alphaunless that is the definition you want.
I can't say it enough: never iterate over the indices of a stringif you care about correctness, always use your unicode library for this.

使用 ICU处理您的数据（或类似的库）
在您自己的数据存储中，确保所有内容都以相同的编码存储
确保你总是使用你的 unicode 库来处理像字符串长度、大小写状态等的普通任务。is_alpha除非这是你想要的定义，否则永远不要使用标准库内置函数。
我不能说太多：如果你关心正确性，永远不要迭代 a 的索引，string为此总是使用你的 unicode 库。

Answer 2

回答by eestrada

If you don't care about backwards compatibility with previous C++ standards, the current C++11 standard has built in Unicode support: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3242.pdf

如果您不关心与以前的 C++ 标准的向后兼容性，当前的 C++11 标准已经内置了 Unicode 支持：http: //www.open-std.org/JTC1/SC22/WG21/docs/papers/2011 /n3242.pdf

So the truly best practice for Unicode processing in C++ would be to use the built in facilities for it. That isn't always a possibility with older code bases though, with the standard being so new at present.

因此，在 C++ 中处理 Unicode 的真正最佳实践是使用内置工具。但是，对于较旧的代码库，这并不总是可行的，因为目前的标准是如此新。

EDIT: To clarify, C++11 is Unicode aware in that it now has support for Unicode literals and Unicode strings. However, the standard library has only limited supportfor Unicode processing and conversion. For your current needs this may be enough. However, if you need to do a large amount of heavy lifting right now then you may still need to use something like ICUfor more in-depth processing. There are some proposals currently in the worksto include more robust support for text conversion between different encodings. My guess (and hope) is that this will be part of the next technical report.

编辑：澄清一下，C++11 是 Unicode 感知的，因为它现在支持 Unicode 文字和 Unicode 字符串。但是，标准库对 Unicode 处理和转换的支持有限。对于您当前的需求，这可能就足够了。但是，如果您现在需要进行大量繁重的工作，那么您可能仍然需要使用ICU 之类的东西进行更深入的处理。有一些建议，目前的作品，包括针对不同编码之间进行文本转换更强大的支持。我的猜测（和希望）是这将成为下一份技术报告的一部分。

Answer 3

回答by jschroedl

Our company (and others) use the open source Internation Components for Unicode(ICU) library originally developed by Taligent.

我们公司（和其他公司）使用最初由 Taligent 开发的开源Unicode 国际组件（ICU）库。

It handles strings, locales, conversions, date/times, collation, transformations, et. al.

它处理字符串、语言环境、转换、日期/时间、排序规则、转换等。阿尔。

Start with the ICU Userguide

从ICU 用户指南开始

Answer 4

回答by Adam Pierce

Here is a checklist for Windows programming:

这是 Windows 编程的清单：

All strings enclosed in _T("my string")
strlen() etc. functions replaced with _tcslen() etc.
Use LPTSTR and LPCTSTR instead of char * and const char *
When starting new projects in Dev Studio, religiously make sure the Unicode option is selected in your project properties.
For C++ strings, use std::wstring instead of std::string

_T("my string") 中包含的所有字符串
strlen() 等函数替换为 _tcslen() 等。
使用 LPTSTR 和 LPCTSTR 代替 char * 和 const char *
在 Dev Studio 中启动新项目时，请务必确保在项目属性中选择了 Unicode 选项。
对于 C++ 字符串，使用 std::wstring 而不是 std::string

Answer 5

回答by ine

Look at Case insensitive string comparison in C++

看看 C++ 中不区分大小写的字符串比较

That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx

这个问题有一个链接到微软关于 Unicode 的文档：http: //msdn.microsoft.com/en-us/library/cc194799.aspx

If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. It is part of a chapter on "Encoding Characters" (http://msdn.microsoft.com/en-us/library/cc194786.aspx)

如果您查看那篇文章旁边的 MSDN 左侧导航栏，您应该会找到很多与 Unicode 函数相关的信息。它是“编码字符”一章的一部分（http://msdn.microsoft.com/en-us/library/cc194786.aspx）

It has the following subsections:

它有以下小节：

The Code-Page Model
Double-Byte Character Sets in Windows
Unicode
Compatibility Issues in Mixed Environments
Unicode Data Conversion
Migrating Windows-Based Programs to Unicode
Summary

代码页模型
Windows 中的双字节字符集
统一码
混合环境中的兼容性问题
Unicode 数据转换
将基于 Windows 的程序迁移到 Unicode
概括

Answer 6

回答by Willow Schlanger

Although this may not be best practice for everyone, you can write your own C++ UNICODE routines if you want!

尽管这可能不是每个人的最佳实践，但您可以根据需要编写自己的 C++ UNICODE 例程！

I just finished doing it over a weekend. I learned a lot, though I don't guarantee it's 100% bug free, I did a lot of testing and it seems to work correctly.

我刚刚完成了一个周末。我学到了很多东西，虽然我不能保证它 100% 没有错误，但我做了很多测试，它似乎可以正常工作。

My code is under the New BSD license and can be found here:

我的代码在新 BSD 许可下，可以在这里找到：

http://code.google.com/p/netwidecc/downloads/list

It is called WSUCONV and comes with a sample main() program that converts between UTF-8, UTF-16, and Standard ASCII. If you throw away the main code, you've got a nice library for reading / writing UNICODE.

它被称为 WSUCONV 并带有一个示例 main() 程序，可以在 UTF-8、UTF-16 和标准 ASCII 之间进行转换。如果你扔掉主要代码，你就有了一个很好的用于读/写 UNICODE 的库。

Answer 7

回答by Paul Hutchinson

As has been said above a library is the best bet when using a large system. However some times you do want to handle things your self (maybe because the library would use to many resources like on a micro controller). In this case you want a simple library that you can copy the parts out of for the things you actually need.

如上所述，在使用大型系统时，库是最好的选择。但是，有时您确实希望自己处理事情（可能是因为该库会使用许多资源，例如在微控制器上）。在这种情况下，您需要一个简单的库，您可以从中复制您实际需要的部分。

Willow Schlanger's example code seems like a good one (see his answer for more details).

Willow Schlanger 的示例代码看起来不错（有关更多详细信息，请参阅他的回答）。

I also found another one that has smaller code, but lacks full error checking and only handles UTF-8 but was simpler to take parts out of.

我还发现了另一个代码较小的代码，但缺乏完整的错误检查，仅处理 UTF-8，但更容易去除部分。

Here's a list of the embedded libraries that seem decent.

这是一个看起来不错的嵌入式库列表。

Embedded libraries

嵌入式库

http://code.google.com/p/netwidecc/downloads/list(UTF8, UTF16LE, UTF16BE, UTF32)
http://www.cprogramming.com/tutorial/unicode.html(UTF8)
http://utfcpp.sourceforge.net/(Simple UTF8 library)

http://code.google.com/p/netwidecc/downloads/list（UTF8、UTF16LE、UTF16BE、UTF32）
http://www.cprogramming.com/tutorial/unicode.html(UTF8)
http://utfcpp.sourceforge.net/（简单的 UTF8 库）

Answer 8

回答by Jan Rüegg

Have a look at the recommendations of UTF-8 Everywhere

看看UTF-8 Everywhere的建议

Answer 9

回答by Joe Schneider

Use IBM's International Components for Unicode

使用 IBM 的Unicode 国际组件

C++ 中的 Unicode 处理

提问by Fortepianissimo

采纳答案by hazzen

回答by eestrada

回答by jschroedl

回答by Adam Pierce

回答by ine

回答by Willow Schlanger

回答by Paul Hutchinson

Embedded libraries

嵌入式库

回答by Jan Rüegg

回答by Joe Schneider

相关推荐

最近更新

标签

C++ 中的 Unicode 处理

提问by Fortepianissimo

采纳答案by hazzen

回答by eestrada

回答by jschroedl

回答by Adam Pierce

回答by ine

回答by Willow Schlanger

回答by Paul Hutchinson

Embedded libraries

嵌入式库

回答by Jan Rüegg

回答by Joe Schneider

相关推荐

C++ “缺少模板参数”是什么意思？

Visual C++ - 运行时检查失败 #3 - 变量未初始化

除了在 C/C++ 中使用 %（模数），还有其他选择吗？

C++ 从“const char*”到“char*”的无效转换[-fpermissive]

相关推荐

最近更新

标签

C++ 从“const char”到“char”的无效转换[-fpermissive]