windows 将 UTF-8 字符转换为大写/小写 C++

Question

提问by NSA

I have a string that contains UTF-8 Characters, and I have a method that is supposed to convert every character to either upper or lower case, this is easily done with characters that overlap with ASCII, and obviously some characters cannot be converted, e.g. any Chinese character. However is there a good way to detect and convert other characters that can be Upper/Lower, e.g. all the greek characters? Also please note that I need to be able to do this on both Windows and Linux.

我有一个包含 UTF-8 字符的字符串，我有一个方法可以将每个字符转换为大写或小写，这很容易用与 ASCII 重叠的字符完成，显然有些字符无法转换，例如任何汉字。但是，有没有一种好方法可以检测和转换其他可以是大写/小写的字符，例如所有希腊字符？另请注意，我需要能够在 Windows 和 Linux 上执行此操作。

Thank you,

谢谢，

Answer 1

回答by Alexandre C.

Have a look at ICU.

看看ICU。

Note that lower case to upper case functions are locale-dependant. Think about the turkish (ascii) letter I which gets "dotless lowercase i" and (ascii) i which gets "uppercase I with a dot".

请注意，小写到大写的函数取决于语言环境。想想土耳其语 (ascii) 字母 I 得到“无点小写 i”和 (ascii) i 得到“带点的大写 I”。

Answer 2

回答by tidwall

Assuming that you have access to wctype.h, then convert your text to a 2-byte unicode string and use towupper(). Then convert it back to UTF-8.

假设您有权访问 wctype.h，然后将您的文本转换为 2 字节的 unicode 字符串并使用 towupper()。然后将其转换回UTF-8。

Answer 3

回答by Davislor

On Linux, or with a standard library that supports it, you would obtain a std::localeobject for the appropriate locale, as uppercase conversion is locale-specific. Convert each UTF-8 character to a wchar_t, then call std::toupper()on it, then convert back to UTF-8. Note that the resulting string might be longer or shorter, and some ligatures might not work properly: ? to Ss in German is the example everyone keeps bringing up.

在 Linux 上，或使用支持它的标准库，您将获得一个std::locale适用于适当语言环境的对象，因为大写转换是特定于语言环境的。将每个 UTF-8 字符转换为wchar_t，然后调用std::toupper()它，然后转换回 UTF-8。请注意，生成的字符串可能更长或更短，并且某些连字可能无法正常工作： ? 德语中的 Ss 是每个人都不断提出的例子。

On Windows, this approach will work even less of the time, because wide characters are UTF-16 and not a fixed-width encoding (which violates the C++ language standard, but then maybe the standards committee shouldn't have tried to bluff Microsoft into breaking the Windows API). There is a ToUppermethod in the CLR.

在 Windows 上，这种方法的工作时间甚至更少，因为宽字符是 UTF-16 而不是固定宽度的编码（这违反了 C++ 语言标准，但也许标准委员会不应该试图欺骗微软破坏 Windows API）。ToUpperCLR中有一个方法。

It is probably easier to use a portable library such as ICU.

使用 ICU 等便携式库可能更容易。

Also make sure whether what you want is uppercase (capitalizing every letter) or titlecase (capitalizing the first letter of a string, or the first part of a ligature).

还要确保您想要的是大写（将每个字母大写）还是 titlecase（将字符串的第一个字母或连字的第一部分大写）。

windows 将 UTF-8 字符转换为大写/小写 C++

提问by NSA

回答by Alexandre C.

回答by tidwall

回答by Davislor

相关推荐

最近更新

标签

windows 将 UTF-8 字符转换为大写/小写 C++

提问by NSA

回答by Alexandre C.

回答by tidwall

回答by Davislor

相关推荐

windows 如何在 XAMPP 中捆绑的 PHP 中添加 PostgreSQL 支持？

windows 如何为我的应用禁用 werfault.exe？

windows 将文件复制到所有文件夹的批处理文件？

windows 修复以在 emacs 中平滑滚动？

相关推荐

最近更新

标签