C++ 将 wchar_t 转换为 int

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6068801/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 19:26:14  来源:igfitidea点击:

Convert wchar_t to int

c++wchar-t

提问by Lasse Espeholt

how can I convert a wchar_t('9') to a digit in the form of an int(9)?

我怎么可以转换wchar_t'9')到一个数字中的形式int9)?

I have the following code where I check whether or not peekis a digit:

我有以下代码,用于检查是否peek为数字:

if (iswdigit(peek)) {
    // store peek as numeric
}

Can I just subtract '0'or is there some Unicode specifics I should worry about?

我可以减去'0'还是有一些我应该担心的 Unicode 细节?

采纳答案by James Kanze

If the question concerns just '9'(or one of the Roman digits), just subtracting '0'is the correct solution. If you're concerned with anything for which iswdigitreturns non-zero, however, the issue may be far more complex. The standard says that iswdigitreturns a non-zero value if its argument is "a decimal digit wide-character code [in the current local]". Which is vague, and leaves it up to the locale to define exactly what is meant. In the "C" locale or the "Posix" locale, the "Posix" standard, at least, guarantees that only the Roman digits zero through nine are considered decimal digits (if I understand it correctly), so if you're in the "C" or "Posix" locale, just subtracting '0' should work.

如果问题只涉及'9'(或其中一个罗马数字),那么减法'0'就是正确的解决方案。但是,如果您关心iswdigit返回非零的任何内容,问题可能要复杂得多。标准说,iswdigit如果它的参数是“一个十进制数字宽字符代码[在当前本地]” ,则返回一个非零值。这是模糊的,并由语言环境来准确定义其含义。在“C”语言环境或“Posix”语言环境中,“Posix”标准至少保证只有罗马数字零到九被认为是十进制数字(如果我理解正确的话),所以如果你在“C”或“Posix”语言环境,只需减去“0”就可以了。

Presumably, in a Unicode locale, this would be any character which has the general category Nd. There are a number of these. The safest solution would be simply to create something like (variables here with static lifetime):

据推测,在 Unicode 语言环境中,这将是具有一般类别的任何字符Nd。有很多这样的。最安全的解决方案是简单地创建类似的东西(这里的变量具有静态生命周期):

wchar_t const* const digitTables[] =
{
    L"0123456789",
    L"\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669",
    // ...
};

//!     \return
//!         wch as a numeric digit, or -1 if it is not a digit
int asNumeric( wchar_t wch )
{
    int result = -1;
    for ( wchar_t const* const* p = std::begin( digitTables );
            p != std::end( digitTables ) && result == -1;
            ++ p ) {
        wchar_t const* q = std::find( *p, *p + 10, wch );
        if ( q != *p + 10 ) {
            result = q - *p;
    }
    return result;
}

If you go this way:

如果你走这条路:

  1. you'll definitely want to download the UnicodeData.txtfile from the Unicode consortium ("Uncode Character Database"—this page has a links to both the Unicode data file and an explination of the encodings used in it), and
  2. possibly write a simple parser of this file to extract the information automatically (e.g. when there is a new version of Unicode)—the file is designed for simple programmatic parsing.
  1. 你肯定想UnicodeData.txt从 Unicode 联盟下载 文件(“ Uncode Character Database”——这个页面有一个指向 Unicode 数据文件和其中使用的编码的解释的链接),并且
  2. 可能会为此文件编写一个简单的解析器以自动提取信息(例如,当有新版本的 Unicode 时)——该文件是为简单的编程解析而设计的。

Finally, note that solutions based on ostringstreamand istringstream(this includes boost::lexical_cast) will not work, since the conversions used in streams are defined to only use the Roman digits. (On the other hand, it might be reasonable to restrict your code to just the Roman digits. In which case, the test becomes if ( wch >= L'0' && wch <= L'9' ), and the conversion is done by simply subtracting L'0'— always supposing the the native encoding of wide character constants in your compiler is Unicode (the case, I'm pretty sure, of both VC++ and g++). Or just ensure that the locale is "C" (or "Posix", on a Unix machine).

最后,请注意基于ostringstreamistringstream(包括boost::lexical_cast)的解决方案将不起作用,因为流中使用的转换被定义为仅使用罗马数字。(另一方面,将您的代码限制为仅罗马数字可能是合理的。在这种情况下,测试变为if ( wch >= L'0' && wch <= L'9' ),并且转换是通过简单的减法完成的L'0'- 始终假设编译器中宽字符常量的本机编码是 Unicode(我很确定,VC++ 和 g++ 都是这种情况)或者只是确保语言环境是“C”(或“Posix”,在 Unix 机器上)。

EDIT: I forgot to mention: if you're doing any serious Unicode programming, you should look into ICU. Handling Unicode correctly is extremely non-trivial, and they've a lot of functionality already implemented.

编辑:我忘了提到:如果你正在做任何严肃的 Unicode 编程,你应该看看ICU。正确处理 Unicode 非常重要,而且它们已经实现了很多功能。

回答by Daren Thomas

Look into the atoiclass of functions: http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx

查看atoi函数类:http: //msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx

Especially _wtoi(const wchar_t *string);seems to be what you're looking for. You would have to make sure your wchar_tis properly null terminated, though, so try something like this:

特别是_wtoi(const wchar_t *string);似乎是你正在寻找的东西。但是,您必须确保您wchar_t的空终止正确,因此请尝试以下操作:

if (iswdigit(peek)) {
    // store peek as numeric
    wchar_t s[2];
    s[0] = peek;
    s[1] = 0;
    int numeric_peek = _wtoi(s);
}

回答by Kirill V. Lyadvinsky

You could use boost::lexical_cast:

你可以使用boost::lexical_cast

const wchar_t c = '9';
int n = boost::lexical_cast<int>( c );

回答by Kirill Kovalenko

Despite MSDN documentation, a simple test suggest that not only ranger L'0'-L'9' returns true.

尽管有MSDN 文档,但一个简单的测试表明,不仅 Ranger L'0'-L'9' 返回 true。

for(wchar_t i = 0; i < 0xFFFF; ++i)
{
    if (iswdigit(i))
    {
        wprintf(L"%d : %c\n", i, i);
    }
}

That means that L'0' subtraction probably won't work as you may expected.

这意味着 L'0' 减法可能不会像您预期的那样工作。

回答by Ian Goldby

For most purposes you can just subtract the code for '0'.

大多数情况下,您只需减去“0”的代码即可。

However, the Wikipedia article on Unicode numerialsmentions that the decimal digits are represented in 23 separate blocks (including twice in Arabic).

但是,维基百科关于Unicode数字的文章提到十进制数字用 23 个单独的块表示(包括两次阿拉伯语)。

If you are not worried about that, then just subtract the code for '0'.

如果您不担心,那么只需减去“0”的代码。