C++ 将 wchar_t 转换为 int
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6068801/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert wchar_t to int
提问by Lasse Espeholt
how can I convert a wchar_t
('9'
) to a digit in the form of an int
(9
)?
我怎么可以转换wchar_t
('9'
)到一个数字中的形式int
(9
)?
I have the following code where I check whether or not peek
is a digit:
我有以下代码,用于检查是否peek
为数字:
if (iswdigit(peek)) {
// store peek as numeric
}
Can I just subtract '0'
or is there some Unicode specifics I should worry about?
我可以减去'0'
还是有一些我应该担心的 Unicode 细节?
采纳答案by James Kanze
If the question concerns just '9'
(or one of the Roman
digits), just subtracting '0'
is the correct solution. If
you're concerned with anything for which iswdigit
returns
non-zero, however, the issue may be far more complex. The
standard says that iswdigit
returns a non-zero value if its
argument is "a decimal digit wide-character code [in the current
local]". Which is vague, and leaves it up to the locale to
define exactly what is meant. In the "C" locale or the "Posix"
locale, the "Posix" standard, at least, guarantees that only the
Roman digits zero through nine are considered decimal digits (if
I understand it correctly), so if you're in the "C" or "Posix"
locale, just subtracting '0' should work.
如果问题只涉及'9'
(或其中一个罗马数字),那么减法'0'
就是正确的解决方案。但是,如果您关心iswdigit
返回非零的任何内容,问题可能要复杂得多。标准说,iswdigit
如果它的参数是“一个十进制数字宽字符代码[在当前本地]” ,则返回一个非零值。这是模糊的,并由语言环境来准确定义其含义。在“C”语言环境或“Posix”语言环境中,“Posix”标准至少保证只有罗马数字零到九被认为是十进制数字(如果我理解正确的话),所以如果你在“C”或“Posix”语言环境,只需减去“0”就可以了。
Presumably, in a Unicode locale, this would be any character
which has the general category Nd
. There are a number of
these. The safest solution would be simply to create something
like (variables here with static lifetime):
据推测,在 Unicode 语言环境中,这将是具有一般类别的任何字符Nd
。有很多这样的。最安全的解决方案是简单地创建类似的东西(这里的变量具有静态生命周期):
wchar_t const* const digitTables[] =
{
L"0123456789",
L"\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669",
// ...
};
//! \return
//! wch as a numeric digit, or -1 if it is not a digit
int asNumeric( wchar_t wch )
{
int result = -1;
for ( wchar_t const* const* p = std::begin( digitTables );
p != std::end( digitTables ) && result == -1;
++ p ) {
wchar_t const* q = std::find( *p, *p + 10, wch );
if ( q != *p + 10 ) {
result = q - *p;
}
return result;
}
If you go this way:
如果你走这条路:
- you'll definitely want to download the
UnicodeData.txt
file from the Unicode consortium ("Uncode Character Database"—this page has a links to both the Unicode data file and an explination of the encodings used in it), and - possibly write a simple parser of this file to extract the information automatically (e.g. when there is a new version of Unicode)—the file is designed for simple programmatic parsing.
- 你肯定想
UnicodeData.txt
从 Unicode 联盟下载 文件(“ Uncode Character Database”——这个页面有一个指向 Unicode 数据文件和其中使用的编码的解释的链接),并且 - 可能会为此文件编写一个简单的解析器以自动提取信息(例如,当有新版本的 Unicode 时)——该文件是为简单的编程解析而设计的。
Finally, note that solutions based on ostringstream
and
istringstream
(this includes boost::lexical_cast
) will not
work, since the conversions used in streams are defined to only
use the Roman digits. (On the other hand, it might be
reasonable to restrict your code to just the Roman digits. In
which case, the test becomes if ( wch >= L'0' && wch <= L'9' )
,
and the conversion is done by simply subtracting L'0'
—
always supposing the the native encoding of wide character
constants in your compiler is Unicode (the case, I'm pretty
sure, of both VC++ and g++). Or just ensure that the locale is
"C" (or "Posix", on a Unix machine).
最后,请注意基于ostringstream
和
istringstream
(包括boost::lexical_cast
)的解决方案将不起作用,因为流中使用的转换被定义为仅使用罗马数字。(另一方面,将您的代码限制为仅罗马数字可能是合理的。在这种情况下,测试变为if ( wch >= L'0' && wch <= L'9' )
,并且转换是通过简单的减法完成的L'0'
- 始终假设编译器中宽字符常量的本机编码是 Unicode(我很确定,VC++ 和 g++ 都是这种情况)或者只是确保语言环境是“C”(或“Posix”,在 Unix 机器上)。
EDIT: I forgot to mention: if you're doing any serious Unicode programming, you should look into ICU. Handling Unicode correctly is extremely non-trivial, and they've a lot of functionality already implemented.
编辑:我忘了提到:如果你正在做任何严肃的 Unicode 编程,你应该看看ICU。正确处理 Unicode 非常重要,而且它们已经实现了很多功能。
回答by Daren Thomas
Look into the atoi
class of functions: http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx
查看atoi
函数类:http: //msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx
Especially _wtoi(const wchar_t *string);
seems to be what you're looking for. You would have to make sure your wchar_t
is properly null terminated, though, so try something like this:
特别是_wtoi(const wchar_t *string);
似乎是你正在寻找的东西。但是,您必须确保您wchar_t
的空终止正确,因此请尝试以下操作:
if (iswdigit(peek)) {
// store peek as numeric
wchar_t s[2];
s[0] = peek;
s[1] = 0;
int numeric_peek = _wtoi(s);
}
回答by Kirill V. Lyadvinsky
You could use boost::lexical_cast
:
你可以使用boost::lexical_cast
:
const wchar_t c = '9';
int n = boost::lexical_cast<int>( c );
回答by Kirill Kovalenko
Despite MSDN documentation, a simple test suggest that not only ranger L'0'-L'9' returns true.
尽管有MSDN 文档,但一个简单的测试表明,不仅 Ranger L'0'-L'9' 返回 true。
for(wchar_t i = 0; i < 0xFFFF; ++i)
{
if (iswdigit(i))
{
wprintf(L"%d : %c\n", i, i);
}
}
That means that L'0' subtraction probably won't work as you may expected.
这意味着 L'0' 减法可能不会像您预期的那样工作。
回答by Ian Goldby
For most purposes you can just subtract the code for '0'.
大多数情况下,您只需减去“0”的代码即可。
However, the Wikipedia article on Unicode numerialsmentions that the decimal digits are represented in 23 separate blocks (including twice in Arabic).
但是,维基百科关于Unicode数字的文章提到十进制数字用 23 个单独的块表示(包括两次阿拉伯语)。
If you are not worried about that, then just subtract the code for '0'.
如果您不担心,那么只需减去“0”的代码。