C++ 将 Unicode UTF-8 文件读入 wstring

Question

提问by Abdelwahed

How can I read a Unicode (UTF-8) file into wstring(s) on the Windows platform?

如何wstring在 Windows 平台上将Unicode (UTF-8) 文件读入(s) 中？

Answer 1

回答by LihO

With C++11 support, you can use std::codecvt_utf8 facetwhich encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character stringand which can be used to read and write UTF-8 files, both text and binary.

随着C ++ 11的支持，你可以使用的std :: codecvt_utf8方面它封装了一个UTF-8编码的字节串和UCS2或UCS4字符串之间的转换，并可以用来读取和写入UTF-8文件，文本和二进制.

In order to use facetyou usually create locale objectthat encapsulates culture-specific information as a set of facets that collectively define a specific localized environment.Once you have a locale object, you can imbueyour stream buffer with it:

为了使用facet，您通常会创建locale 对象，该对象将特定于文化的信息封装为一组共同定义特定本地化环境的 facet。一旦你有了一个语言环境对象，你就可以用它来填充你的流缓冲区：

#include <sstream>
#include <fstream>
#include <codecvt>

std::wstring readFile(const char* filename)
{
    std::wifstream wif(filename);
    wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
    std::wstringstream wss;
    wss << wif.rdbuf();
    return wss.str();
}

which can be used like this:

可以这样使用：

std::wstring wstr = readFile("a.txt");

Alternatively you can set the global C++ localebefore you work with string streams which causes all future calls to the std::localedefault constructor to return a copy of the global C++ locale(you don't need to explicitly imbue stream buffers with it then):

或者，您可以在使用字符串流之前设置全局 C++ 语言环境，这会导致所有未来对std::locale默认构造函数的调用返回全局 C++ 语言环境的副本（然后您不需要显式地将其注入流缓冲区）：

std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));

Answer 2

回答by Philipp

According to a comment by @Hans Passant, the simplest way is to use _wfopen_s. Open the file with mode rt, ccs=UTF-8.

根据@Hans Passant 的评论，最简单的方法是使用_wfopen_s。用 mode 打开文件rt, ccs=UTF-8。

Here is another pure C++ solution that works at least with VC++ 2010:

这是另一个至少适用于 VC++ 2010 的纯 C++ 解决方案：

#include <locale>
#include <codecvt>
#include <string>
#include <fstream>
#include <cstdlib>

int main() {
    const std::locale empty_locale = std::locale::empty();
    typedef std::codecvt_utf8<wchar_t> converter_type;
    const converter_type* converter = new converter_type;
    const std::locale utf8_locale = std::locale(empty_locale, converter);
    std::wifstream stream(L"test.txt");
    stream.imbue(utf8_locale);
    std::wstring line;
    std::getline(stream, line);
    std::system("pause");
}

Except for locale::empty()(here locale::global()might work as well) and the wchar_t*overload of the basic_ifstreamconstructor, this should even be pretty standard-compliant (where “standard” means C++0x, of course).

除了locale::empty()（这里locale::global()也可以工作）和构造函数的wchar_t*重载之外basic_ifstream，这甚至应该是非常符合标准的（其中“标准”当然是指 C++0x）。

Answer 3

回答by AshleysBrain

Here's a platform-specific function for Windows only:

这是仅适用于 Windows 的特定于平台的功能：

size_t GetSizeOfFile(const std::wstring& path)
{
    struct _stat fileinfo;
    _wstat(path.c_str(), &fileinfo);
    return fileinfo.st_size;
}

std::wstring LoadUtf8FileToString(const std::wstring& filename)
{
    std::wstring buffer;            // stores file contents
    FILE* f = _wfopen(filename.c_str(), L"rtS, ccs=UTF-8");

    // Failed to open file
    if (f == NULL)
    {
        // ...handle some error...
        return buffer;
    }

    size_t filesize = GetSizeOfFile(filename);

    // Read entire file contents in to memory
    if (filesize > 0)
    {
        buffer.resize(filesize);
        size_t wchars_read = fread(&(buffer.front()), sizeof(wchar_t), filesize, f);
        buffer.resize(wchars_read);
        buffer.shrink_to_fit();
    }

    fclose(f);

    return buffer;
}

Use like so:

像这样使用：

std::wstring mytext = LoadUtf8FileToString(L"C:\MyUtf8File.txt");

Note the entire file is loaded in to memory, so you might not want to use it for very large files.

请注意，整个文件已加载到内存中，因此您可能不想将其用于非常大的文件。

Answer 4

回答by Shen Yu

#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <cstdlib>

int main()
{
    std::wifstream wif("filename.txt");
    wif.imbue(std::locale("zh_CN.UTF-8"));

    std::wcout.imbue(std::locale("zh_CN.UTF-8"));
    std::wcout << wif.rdbuf();
}

Answer 5

回答by ThomasMcLeod

This question was addressed in Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI. In sum, wstring is based upon the UCS-2 standard, which is the predecessor of UTF-16. This is a strictly two byte standard. I believe this covers Arabic.

这个问题在Confused about C++'s std::wstring, UTF-16, UTF-8 and display strings in a windows GUI 中得到解决。总之，wstring 基于 UCS-2 标准，它是 UTF-16 的前身。这是一个严格的两字节标准。我相信这涵盖了阿拉伯语。

Answer 6

回答by dlchambers

This is a bit raw, but how about reading the file as plain old bytes then cast the byte buffer to wchar_t* ?

这有点原始，但是如何将文件作为普通旧字节读取然后将字节缓冲区转换为 wchar_t* ？

Something like:

就像是：

#include <iostream>
#include <fstream>
std::wstring ReadFileIntoWstring(const std::wstring& filepath)
{
    std::wstring wstr;
    std::ifstream file (filepath.c_str(), std::ios::in|std::ios::binary|std::ios::ate);
    size_t size = (size_t)file.tellg();
    file.seekg (0, std::ios::beg);
    char* buffer = new char [size];
    file.read (buffer, size);
    wstr = (wchar_t*)buffer;
    file.close();
    delete[] buffer;
    return wstr;
}

C++ 将 Unicode UTF-8 文件读入 wstring

提问by Abdelwahed

回答by LihO

回答by Philipp

回答by AshleysBrain

回答by Shen Yu

回答by ThomasMcLeod

回答by dlchambers

相关推荐

最近更新

标签

C++ 将 Unicode UTF-8 文件读入 wstring

提问by Abdelwahed

回答by LihO

回答by Philipp

回答by AshleysBrain

回答by Shen Yu

回答by ThomasMcLeod

回答by dlchambers

相关推荐

C++ 链接两个 .cpp 和一个 .h 文件

C++ 中的前向声明是什么？

C++ std::max - 需要一个标识符

C++ 如何将字符数组中的一系列数据复制到向量中？

相关推荐

最近更新

标签