windows 我如何编写 std::codecvt 方面?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2971386/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I write a std::codecvt facet?
提问by Billy ONeal
How do I write a std::codecvt facet? I'd like to write ones that go from UTF-16 to UTF-8, which go from UTF-16 to the systems current code page (windows, so CP_ACP), and to the system's OEM codepage (windows, so CP_OEM).
我如何编写 std::codecvt 方面?我想写一些从 UTF-16 到 UTF-8,从 UTF-16 到系统当前代码页(windows,所以 CP_ACP),以及系统的 OEM 代码页(windows,所以 CP_OEM)。
Cross-platform is preferred, but MSVC on Windows is fine too. Are there any kinds of tutorials or anything of that nature on how to correctly use this class?
跨平台是首选,但 Windows 上的 MSVC 也很好。是否有关于如何正确使用此类的任何类型的教程或任何此类性质的内容?
采纳答案by Basilevs
I've written one based on iconv. It can be used on windows or on any POSIX OS. (You will need to link with iconv obviously).
我写了一个基于 iconv 的。它可以在 Windows 或任何 POSIX 操作系统上使用。(您显然需要与 iconv 链接)。
The answer for the "how to" question is to follow the codecvt reference. I was not able to find any better instructions in the Internet two years ago.
“如何”问题的答案是遵循codecvt 参考。两年前我在互联网上找不到更好的说明。
Important notices
重要通知
- theoretically there is no need for such work. codecvt_bynameshould be enough on any standard supporting platform. But in reality there are some compilers that don't support or badly support this class. There is also a difference in interfaces of codecvt_byname on different compilers.
- my working example is implemented with state template parameter of codecvt. Always use standard mbstate type there as this is the only way to use your codecvt with standard iostream classes.
- std::mbstate_t type can't be used as a pointer on 64bit platforms in a cross-platform way.
- stateless conversions work for short strings, but may fail if you try to convert a data chunk greater that streambuf internal buffer size (UTF is essentially stateful encoding)
- 理论上不需要这样的工作。codecvt_byname在任何标准支持平台上都应该足够了。但实际上有一些编译器不支持或严重支持这个类。codecvt_byname 在不同编译器上的接口也有差异。
- 我的工作示例是使用 codecvt 的状态模板参数实现的。始终在那里使用标准 mbstate 类型,因为这是将 codecvt 与标准 iostream 类一起使用的唯一方法。
- std::mbstate_t 类型不能以跨平台方式用作 64 位平台上的指针。
- 无状态转换适用于短字符串,但如果您尝试转换大于 streambuf 内部缓冲区大小的数据块,则可能会失败(UTF 本质上是有状态编码)
回答by apenwarr
The problem with this std::codecvt is it's a solution looking for a problem. Or rather, the problem it's trying to solve is unsolvable, so anybody trying to use it as a solution is going to be very disappointed.
这个 std::codecvt 的问题是它是一个寻找问题的解决方案。或者更确切地说,它试图解决的问题是无法解决的,因此任何试图将其用作解决方案的人都会非常失望。
If you don't know which character set your input or output is, then std::codecvt isn't ever going to be able to help you. Conversely, if you doknow which character sets you're using, then you can trivially convert between them with a single function call. Wrapping that function call in a complicated mess of templates doesn't change those fundamentals.
如果您不知道您的输入或输出是哪个字符集,那么 std::codecvt 将永远无法帮助您。相反,如果您确实知道正在使用哪些字符集,那么您可以通过单个函数调用轻松地在它们之间进行转换。将该函数调用包装在一堆复杂的模板中并不会改变这些基本原理。
...and that's why nobody uses std::codecvt. I recommend you just do what everybody else does, and pretend it never happened.
...这就是为什么没有人使用 std::codecvt 的原因。我建议你做其他人都做的事情,并假装它从未发生过。