windows 我怎样才能cin和cout一些unicode文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3207704/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 14:47:45  来源:igfitidea点击:

How can I cin and cout some unicode text?

c++windowsunicodeconsole

提问by Narek

I ask a code snippet which cin a unicode text, concatenates another unicode one to the first unicode text and the cout the result.

我问了一个代码片段,它包含一个 unicode 文本,将另一个 unicode 连接到第一个 unicode 文本并输出结果。

P.S. This code will help me to solve another bigger problem with unicode. But before the key thing is to accomplish what I ask.

PS 此代码将帮助我解决 unicode 的另一个更大问题。但之前的关键是完成我所要求的。

ADDED: BTW I can't write in the command line any unicode symbol when I run the executable file. How I should do that?

添加:顺便说一句,当我运行可执行文件时,我无法在命令行中写入任何 unicode 符号。我该怎么做?

采纳答案by Philipp

Here is an example that shows four different methods, of which only the third (C conio) and the fourth (native Windows API) work (but only if stdin/stdout aren't redirected). Note that you still need a font that contains the character you want to show (Lucida Console supports at least Greek and Cyrillic). Note that everything here is completely non-portable, there is just no portable way to input/output Unicode strings on the terminal.

这是一个显示四种不同方法的示例,其中只有第三个 (C conio) 和第四个(本机 Windows API)有效(但前提是标准输入/标准输出未重定向)。请注意,您仍然需要一种包含要显示的字符的字体(Lucida Console 至少支持希腊语和西里尔语)。请注意,这里的所有内容都是完全不可移植的,只是没有可移植的方式在终端上输入/输出 Unicode 字符串。

#ifndef UNICODE
#define UNICODE
#endif

#ifndef _UNICODE
#define _UNICODE
#endif

#define STRICT
#define NOMINMAX
#define WIN32_LEAN_AND_MEAN

#include <iostream>
#include <string>
#include <cstdlib>
#include <cstdio>

#include <conio.h>
#include <windows.h>

void testIostream();
void testStdio();
void testConio();
void testWindows();

int wmain() {
    testIostream();
    testStdio();
    testConio();
    testWindows();
    std::system("pause");
}

void testIostream() {
    std::wstring first, second;
    std::getline(std::wcin, first);
    if (!std::wcin.good()) return;
    std::getline(std::wcin, second);
    if (!std::wcin.good()) return;
    std::wcout << first << second << std::endl;
}

void testStdio() {
    wchar_t buffer[0x1000];
    if (!_getws_s(buffer)) return;
    const std::wstring first = buffer;
    if (!_getws_s(buffer)) return;
    const std::wstring second = buffer;
    const std::wstring result = first + second;
    _putws(result.c_str());
}

void testConio() {
    wchar_t buffer[0x1000];
    std::size_t numRead = 0;
    if (_cgetws_s(buffer, &numRead)) return;
    const std::wstring first(buffer, numRead);
    if (_cgetws_s(buffer, &numRead)) return;
    const std::wstring second(buffer, numRead);
    const std::wstring result = first + second + L'\n';
    _cputws(result.c_str());
}

void testWindows() {
    const HANDLE stdIn = GetStdHandle(STD_INPUT_HANDLE);
    WCHAR buffer[0x1000];
    DWORD numRead = 0;
    if (!ReadConsoleW(stdIn, buffer, sizeof buffer, &numRead, NULL)) return;
    const std::wstring first(buffer, numRead - 2);
    if (!ReadConsoleW(stdIn, buffer, sizeof buffer, &numRead, NULL)) return;
    const std::wstring second(buffer, numRead);
    const std::wstring result = first + second;
    const HANDLE stdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD numWritten = 0;
    WriteConsoleW(stdOut, result.c_str(), result.size(), &numWritten, NULL);
}
  • Edit 1: I've added a method based on conio.
  • Edit 2: I've messed around with _O_U16TEXTa bit as described in Michael Kaplan's blog, but that seemingly only had wgetsinterpret the (8-bit) data from ReadFileas UTF-16. I'll investigate this a bit further during the weekend.
  • 编辑 1:我添加了一个基于conio.
  • 编辑 2:我_O_U16TEXT在 Michael Kaplan 的博客中描述了一些东西,但这似乎只能wgets将(8 位)数据解释ReadFile为 UTF-16。我会在周末进一步调查这个问题。

回答by Bolo

I had a similar problem in the past, in my case imbueand sync_with_stdiodid the trick. Try this:

我过去遇到过类似的问题,就我而言imbuesync_with_stdio并成功了。尝试这个:

#include <iostream>
#include <locale>
#include <string>

using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    wcin.imbue(locale("en_US.UTF-8"));
    wcout.imbue(locale("en_US.UTF-8"));

    wstring s;
    wstring t(L" la Polynésie fran?aise");

    wcin >> s;
    wcout << s << t << endl;
    return 0;
}

回答by Brian R. Bondy

Depending on what type unicode you mean. I assume you mean you are just working with std::wstringthough. In that case use std::wcinand std::wcout.

取决于你的意思是什么类型的 unicode。我假设你的意思是你只是在工作std::wstring。在这种情况下,请使用std::wcinstd::wcout

For conversion between encodings you can use your OS functions like for Win32: WideCharToMultiByte, MultiByteToWideCharor you can use a library like libiconv

对于编码之间的转换,您可以使用像 Win32: 这样的操作系统函数WideCharToMultiByteMultiByteToWideChar或者您可以使用像libiconv这样的库

回答by John

If you have actual text (i.e., a string of logical characters), then insert to the wide streams instead. The wide streams will automatically encode your characters to match the bits expected by the locale encoding. (And if you have encoded bits instead, the streams will decode the bits, then re-encode them to match the locale.)

如果您有实际的文本(即,一串逻辑字符),则改为插入到宽流中。宽流将自动编码您的字符以匹配区域设置编码所需的位。(如果您改为编码位,流将解码位,然后重新编码它们以匹配区域设置。)

There is a lesser solution if you KNOW you have UTF-encoded bits (i.e., an array of bits intended to be decoded into a string of logical characters) ANDyou KNOW the target of the output stream is expecting that very same bit-format, then you can skip the decoding and re-encoding steps and write() the bits as-is. This only works when you know both sides use the same encoding format, which may be the case for small utilities not intended to communicate with processes in other locales.

如果你知道你有 UTF 编码的位(即,一个位数组打算被解码成一串逻辑字符)并且你知道输出流的目标期望非常相同的位格式,那么有一个较小的解决方案,然后您可以跳过解码和重新编码步骤并按原样 write() 位。这仅在您知道双方使用相同的编码格式时才有效,对于不打算与其他语言环境中的进程进行通信的小型实用程序可能就是这种情况。

回答by Edward Strange

It depends on the OS. If your OS understands you can simply send it UTF-8 sequences.

这取决于操作系统。如果您的操作系统理解您可以简单地向它发送 UTF-8 序列。