C++ 如何可移植地将 std::wstring 写入文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4053918/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to portably write std::wstring to file?
提问by Oystein
I have a wstring
declared as such:
我有一个wstring
这样的声明:
// random wstring
std::wstring str = L"abcàd?ef?ghhhhhhhμa";
The literal would be UTF-8 encoded, because my source file is.
文字将采用 UTF-8 编码,因为我的源文件是。
[EDIT: According to Mark Ransom this is not necessarily the case, the compiler will decide what encoding to use - let us instead assume that I read this string from a file encoded in e.g. UTF-8]
[编辑:根据 Mark Ransom 的说法,情况并非一定如此,编译器将决定使用哪种编码 - 让我们假设我从以 UTF-8 编码的文件中读取此字符串]
I would very much like to get this into a file reading (when text editor is set to the correct encoding)
我非常想将其放入文件读取中(当文本编辑器设置为正确的编码时)
abcàd?ef?ghhhhhhhμa
but ofstream
is not very cooperative (refuses to take wstring
parameters), and wofstream
supposedly needs to know locale and encoding settings. I just want to output this set of bytes. How does one normally do this?
但ofstream
不太合作(拒绝带wstring
参数),wofstream
据说需要知道语言环境和编码设置。我只想输出这组字节。人们通常如何做到这一点?
EDIT: It must be cross platform, and should not rely on the encoding being UTF-8. I just happen to have a set of bytes stored in a wstring
, and want to output them. It could very well be UTF-16, or plain ASCII.
编辑:它必须是跨平台的,不应依赖于编码为 UTF-8。我只是碰巧在 a 中存储了一组字节wstring
,并且想输出它们。它很可能是 UTF-16 或纯 ASCII。
采纳答案by scigor
Why not write the file as a binary. Just use ofstream with the std::ios::binary setting. The editor should be able to interpret it then. Don't forget the Unicode flag 0xFEFF at the beginning. You might be better of writing with a library, try one of these:
为什么不把文件写成二进制文件。只需将 ofstream 与 std::ios::binary 设置一起使用。那时编辑应该能够解释它。不要忘记开头的 Unicode 标志 0xFEFF。您可能更喜欢使用库进行编写,请尝试以下方法之一:
http://www.codeproject.com/KB/files/EZUTF.aspx
http://www.codeproject.com/KB/files/EZUTF.aspx
http://www.gnu.org/software/libiconv/
http://www.gnu.org/software/libiconv/
回答by ST3
For std::wstring
you need std::wofstream
因为std::wstring
你需要std::wofstream
std::wofstream f(L"C:\some file.txt");
f << str;
f.close();
回答by Jerry Coffin
std::wstring
is for something like UTF-16 or UTF-32, notUTF-8. For UTF-8, you probably just want to use std::string
, and write out via std::cout
. Just FWIW, C++0x will have Unicode literals, which should help clarify situations like this.
std::wstring
是针对 UTF-16 或 UTF-32 之类的,而不是UTF-8。对于 UTF-8,您可能只想使用std::string
,并通过std::cout
. 只是 FWIW,C++0x 将有 Unicode 文字,这应该有助于澄清这样的情况。
回答by Basilevs
C++ has means to perform a conversion from wide character to localized ones on output or file write. Usecodecvt facet for that purpose.
C++ 具有在输出或文件写入时执行从宽字符到本地化字符的转换的方法。为此目的使用codecvt facet。
You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.
您可以使用标准std::codecvt_byname或非标准 codecvt_facet implementation。
#include <locale>
using namespace std;
typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
wcout.imbue(utf8locale);
wcout << L"Hello, wide to multybyte world!" << endl;
Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system. I therefore recommend to search stackoverflow for "utf8 codecvt " and make a choice from many referenes of custom codecvt implementations listed.
请注意,在某些平台上 codecvt_byname 只能为系统中安装的语言环境发出转换。因此,我建议在 stackoverflow 中搜索“utf8 codecvt”,并从列出的自定义 codecvt 实现的许多参考文献中进行选择。
EDIT: As OP states that the string is already encoded, all he should do is to remove prefixes L and "w" from every token of his code.
编辑:由于 OP 声明字符串已经编码,他应该做的就是从他的代码的每个标记中删除前缀 L 和“w”。
回答by Steve Townsend
There is a (Windows-specific) solution that should work for you here. Basically, convert wstring
to UTF-8 codepage and then use ofstream
.
有一个(Windows专用)解决方案,它应该为你工作在这里。基本上,转换wstring
为 UTF-8 代码页,然后使用ofstream
.
#include < windows.h >
std::string to_utf8(const wchar_t* buffer, int len)
{
int nChars = ::WideCharToMultiByte(
CP_UTF8,
0,
buffer,
len,
NULL,
0,
NULL,
NULL);
if (nChars == 0) return "";
string newbuffer;
newbuffer.resize(nChars) ;
::WideCharToMultiByte(
CP_UTF8,
0,
buffer,
len,
const_cast< char* >(newbuffer.c_str()),
nChars,
NULL,
NULL);
return newbuffer;
}
std::string to_utf8(const std::wstring& str)
{
return to_utf8(str.c_str(), (int)str.size());
}
int main()
{
std::ofstream testFile;
testFile.open("demo.xml", std::ios::out | std::ios::binary);
std::wstring text =
L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
L"< root description=\"this is a na?ve example\" >\n< /root >";
std::string outtext = to_utf8(text);
testFile << outtext;
testFile.close();
return 0;
}
回答by user225312
Note that wide streams output only char * variables, so maybe you should try using the c_str()
member function to convert a std::wstring
and then output it to the file. Then it should probably work?
请注意,宽流仅输出 char * 变量,因此也许您应该尝试使用c_str()
成员函数转换 astd::wstring
然后将其输出到文件。那么它应该可以工作吗?
回答by Some programmer dude
I had the same problem some time ago, and wrote down the solution I found on my blog. You might want to check it out to see if it might help, especially the function wstring_to_utf8
.
前段时间我也遇到了同样的问题,写下我在博客上找到的解决方案。您可能想检查一下它是否有帮助,尤其是函数wstring_to_utf8
.
http://pileborg.org/b2e/blog5.php/2010/06/13/unicode-utf-8-and-wchar_t
http://pileborg.org/b2e/blog5.php/2010/06/13/unicode-utf-8-and-wchar_t
回答by towi
You should notuse UTF-8 encoded source file if you want to write portable code. Sorry.
你应该不会,如果你想写可以移植的代码中使用UTF-8编码的源文件。对不起。
std::wstring str = L"abcàd?ef?ghhhhhhhμa";
(I am not sure if this actually hurts the standard, but I think it is. But even if, to be safe you should not.)
(我不确定这是否真的损害了标准,但我认为确实如此。但即使为了安全起见,你也不应该这样做。)
Yes, purely using std::ostream
will not work. There are many ways to convert a wstring
to UTF-8. My favorite is using the International Components for Unicode. It's a big lib, but it's great. You get a lot of extras and things you might need in the future.
是的,纯粹使用是std::ostream
行不通的。有很多方法可以将 a 转换wstring
为 UTF-8。我最喜欢的是使用了Unicode国际化组件。这是一个很大的库,但它很棒。你会得到很多额外的东西和你将来可能需要的东西。
回答by snowdude
From my experience of working with different character encodings I would recommend that you only deal with UTF-8 at load and save time. You're in for a world of pain if you try and store the internal representation in UTF-8 since a single character could be anything from 1 byte to 4. So simple operations like strlen require looking at every byte to decide len rather than the allocated buffer (although you can optimize by looking at the first byte in the char sequence, e.g. 00..7f is a single byte char, c2..df indicates a 2 byte char etc).
根据我使用不同字符编码的经验,我建议您只在加载时处理 UTF-8 并节省时间。如果您尝试将内部表示存储在 UTF-8 中,您将陷入痛苦的境地,因为单个字符可能是从 1 个字节到 4 个字节的任何内容。像 strlen 这样的简单操作需要查看每个字节来决定 len 而不是分配的缓冲区(尽管您可以通过查看字符序列中的第一个字节进行优化,例如 00..7f 是单字节字符,c2..df 表示 2 字节字符等)。
People quite often refer to 'Unicode strings' when they mean UTF-16 and on Windows a wchar_t is a fixed 2 bytes. In Windows I think wchar_t is simply:
当人们指的是 UTF-16 时,人们经常提到“Unicode 字符串”,而在 Windows 上 wchar_t 是固定的 2 个字节。在 Windows 中,我认为 wchar_t 很简单:
typedef SHORT wchar_t;
The full UTF-32 4 byte representation is rarely required and very wasteful, here what the Unicode Standard (5.0) has to say on it:
完整的 UTF-32 4 字节表示很少需要并且非常浪费,这里是 Unicode 标准 (5.0) 对它的说明:
"On average more than 99% of all UTF-16 is expressed using single code units... UTF-16 provides the right mix of compact size with the ability to handle the occassional character outside the BMP"
“平均超过 99% 的 UTF-16 是使用单个代码单元表达的……UTF-16 提供了紧凑大小与处理 BMP 之外偶尔出现的字符的能力的正确组合”
In short, use whcar_t as your internal representation and do conversions when loading and saving (and don't worry about full Unicode unless you know you need it).
简而言之,使用 whcar_t 作为您的内部表示并在加载和保存时进行转换(除非您知道需要它,否则不要担心完整的 Unicode)。
With regard to performing the actual conversion have a look at the ICU project:
关于执行实际转换,请查看 ICU 项目: