C++ 从字符串中去除非 ASCII 字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10178700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ Strip non-ASCII Characters from string
提问by AnthonyW
Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.
在你开始之前;是的,我知道这是一个重复的问题,是的,我已经查看了已发布的解决方案。我的问题是我无法让他们工作。
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
I tested this method on "Prus?us, ?gyptians," and it did nothing
I also attempted to substitute isprint
for isalnum
我在“Prus?us,?gyptians”上测试了这种方法,但它没有做任何我也试图替代isprint
的isalnum
The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.
当我在程序的另一部分转换 string->wstring->string 时,会出现真正的问题。如果 string->wstring 转换中有 unicode 字符,则转换会失败。
Ref:
参考:
How can you strip non-ASCII characters from a string? (in C#)
How to strip all non alphanumeric characters from a string in c++?
Edit:
编辑:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:
我仍然想删除所有非 ASCII 字符,不管它是否有帮助,这是我崩溃的地方:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
Error Dialog
错误对话框
MSVC++ Debug Library
MSVC++ 调试库
Debug Assertion Failed!
调试断言失败!
Program: //myproject
程序://我的项目
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
文件:f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Line: //Above
行://上面
Expression:(unsigned)(c+1)<=256
表达式:(无符号)(c+1)<=256
Edit:
编辑:
Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within shouldbe valid.
进一步使问题复杂化:我正在读取的 .txt 文件是 ANSI 编码的。里面的一切都应该是有效的。
Solution:
解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
If someone else would like to copy/paste this, I can check this question off.
如果其他人想复制/粘贴这个,我可以勾选这个问题。
EDIT:
编辑:
For future reference: try using the __isascii, iswasciicommands
供将来参考:尝试使用__isascii、iswascii命令
回答by AnthonyW
Solution:
解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
EDIT:
编辑:
For future reference: try using the __isascii, iswascii commands
供将来参考:尝试使用 __isascii、iswascii 命令
回答by James Kanze
At least one problem is in your invalidChar
function. It should be:
至少有一个问题出在您的invalidChar
函数中。它应该是:
return !isprint( static_cast<unsigned char>( c ) );
Casting a char
to an unsigned
is likely to give some very, very big
values if the char
is negative (UNIT_MAX+1 + c). Passing such a
value to
isprint` is undefined behavior.
如果 a为负(isprint` 是未定义的行为),则将a 转换char
为 anunsigned
可能会给出一些非常非常大的值。char
UNIT_MAX+1 + c). Passing such a
value to
回答by Adrian McCarthy
isprint
depends on the locale, so the character in question must be printable in the current locale.
isprint
取决于语言环境,因此相关字符必须可在当前语言环境中打印。
If you want strictly ASCII, check the range for [0..127]. If you want printable ASCII, check the range and isprint
.
如果您想要严格的 ASCII,请检查 [0..127] 的范围。如果您想要可打印的 ASCII,请检查范围和isprint
.