C++ 如何将 std::string 写入 UTF-8 文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3011082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 11:46:43  来源:igfitidea点击:

How to write a std::string to a UTF-8 text file

c++utf-8

提问by poiloi

I just want to write some few simple lines to a text file in C++, but I want them to be encoded in UTF-8. What is the easiest and simple way to do so?

我只想用 C++ 在文本文件中写一些简单的行,但我希望它们以 UTF-8 编码。最简单的方法是什么?

回答by Ben Voigt

The only way UTF-8 affects std::stringis that size(), length(), and all the indices are measured in bytes, not characters.

UTF-8影响的唯一方法std::stringsize()length(),和所有的索引在字节,而不是字符测量。

And, as sbi points out, incrementing the iterator provided by std::stringwill step forward by byte, not by character, so it can actually point into the middle of a multibyte UTF-8 codepoint. There's no UTF-8-aware iterator provided in the standard library, but there are a few available on the 'Net.

而且,正如 sbi 指出的那样,递增提供的迭代器std::string将按字节而不是按字符向前推进,因此它实际上可以指向多字节 UTF-8 代码点的中间。标准库中没有提供支持 UTF-8 的迭代器,但在 'Net.

If you remember that, you can put UTF-8 into std::string, write it to a file, etc. all in the usual way (by which I mean the way you'd use a std::stringwithout UTF-8 inside).

如果您记得这一点,您可以将 UTF-8 放入std::string、将其写入文件等,所有这些都以通常的方式(我的意思是您在内部使用std::string没有 UTF-8 的方式)。

You may want to start your file with a byte order mark so that other programs will know it is UTF-8.

您可能希望以字节顺序标记开始您的文件,以便其他程序知道它是 UTF-8。

回答by denys

There is nice tiny library to work with utf8 from c++: utfcpp

有一个很好的小库可以使用 C++ 中的 utf8:utfcpp

回答by Brian R. Bondy

libiconvis a great library for all our encoding and decoding needs.

libiconv是一个很好的库,可以满足我们所有的编码和解码需求。

If you are using Windows you can use WideCharToMultiByteand specify that you want UTF8.

如果您使用的是 Windows,您可以使用WideCharToMultiByte并指定您想要 UTF8。

回答by Jakob Riedle

What is the easiest and simple way to do so?

最简单的方法是什么?

The most intuitive and thus easiest handling of utf8 in C++ is for sure using a drop-in replacement for std::string. As the internet still lacks of one, I went to implement the functionality on my own:

用C UTF8 ++的最直观和最容易因此操作是肯定的使用简易替换的std::string。由于网上还缺一个,我自己去实现了这个功能:

tinyutf8(EDIT: now Github).

tinyutf8(编辑:现在 Github)。

This library provides a very lightweight drop-in preplacement for std::string(or std::u32stringif you will, because you iterate over codepointsrather that chars). Ity is implemented succesfully in the middle between fast access and small memory consumption, while being very robust. This robustness to 'invalid' UTF8-sequences makes it (nearly completely) compatible with ANSI (0-255).

这个库为std::string(或者std::u32string如果你愿意,因为你迭代代码点而不是chars)提供了一个非常轻量级的插入式预置。它在快速访问和小内存消耗之间成功实现,同时非常健壮。这种对“无效”UTF8 序列的稳健性使其(几乎完全)与 ANSI (0-255) 兼容。

Hope this helps!

希望这可以帮助!

回答by Tony the Pony

If by "simple" you mean ASCII, there is no need to do any encoding, since characters with an ASCII value of 127 or less are the same in UTF-8.

如果“简单”是指 ASCII,则无需进行任何编码,因为 ASCII 值等于或小于 127 的字符在 UTF-8 中是相同的。

回答by Danil

std::wstring text = L"Привет";
QString qstr = QString::fromStdWString(text);
QByteArray byteArray(qstr.toUtf8());    
std::string str_std( byteArray.constData(), byteArray.length());

回答by rmawatson

My preference is to convert to and from a std::u32string and work with codepoints internally, then convert to utf8 when writing out to a file using theseconverting iterators I put on github.

我的偏好是在 std::u32string 之间进行转换并在内部使用代码点,然后在使用我放在 github 上的这些转换迭代器写入文件时转换为 utf8 。

#include <utf/utf.h>

int main()
{
    using namespace utf;

    u32string u32_text = U"?????";
    // do stuff with string
    // convert to utf8 string
    utf32_to_utf8_iterator<u32string::iterator> pos(u32_text.begin());
    utf32_to_utf8_iterator<u32string::iterator> end(u32_text.end());

    u8string u8_text(pos, end);

    // write out utf8 to file.
    // ...
}

回答by Artem Vorotnikov

Use Glib::ustringfrom glibmm.

使用油嘴:: ustringglibmm

It is the only widespread UTF-8 string container (AFAIK). While glyph (not byte) based, it has the same method signatures as std::stringso the port should be simple search and replace (just make sure that your data is valid UTF-8 before loading it into a ustring).

它是唯一广泛使用的 UTF-8 字符串容器 (AFAIK)。虽然基于字形(不是字节),但它具有相同的方法签名,std::string因此端口应该是简单的搜索和替换(只需确保您的数据在加载到ustring.

回答by Anatoly

As to UTF-8 is multibite characters string and so you get some problems to work and it's a bad idea/ Instead use normal Unicode.

至于 UTF-8 是多位字符串,所以你会遇到一些问题,这是一个坏主意/而是使用普通的 Unicode。

So by my opinion best is use ordinary ASCII char text with some codding set. Need to use Unicode if you use more than 2 sets of different symbols (languages) in single.

所以我认为最好是使用带有一些编码集的普通 ASCII 字符文本。如果您在单个中使用超过 2 组不同的符号(语言),则需要使用 Unicode。

It's rather rare case. In most cases enough 2 sets of symbols. For this common case use ASCII chars, not Unicode.

这是比较少见的情况。在大多数情况下,足够的 2 组符号。对于这种常见情况,请使用 ASCII 字符,而不是 Unicode。

Effect of using multibute chars like UTF-8 you get only China traditional, arabic or some hieroglyphic text. It's very very rare case!!!

使用像 UTF-8 这样的多字节字符的效果你只能得到china传统、阿拉伯或一些象形文字。这是非常非常罕见的情况!!!

I don't think there are many peoples needs that. So never use UTF-8!!! It's avoid strong headache of manipulate such strings.

我不认为有很多人需要那个。所以永远不要使用UTF-8!!!它避免了操作此类字符串的强烈头痛。