windows 显示扩展的 ASCII 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4882031/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 16:07:03  来源:igfitidea点击:

Displaying extended ASCII characters

c++windowsvisual-studio-2005x86

提问by user3234

In Visual Studio 2005 on 32-bit Windows, why doesn't my console display characters from 128 to 255?

在 32 位 Windows 上的 Visual Studio 2005 中,为什么我的控制台不显示从 128 到 255 的字符?

for example:

例如:

cout << "?" << endl;  //inverted question mark

Output:

输出:

┐
Press any key to continue . . .

回答by Cheers and hth. - Alf

A Windows console windowis pure Unicode. Its buffer stores text as UCS-2 Unicode (16 bits per character, essentially like original Unicode, a restriction to the Basic Multilingual Planeof modern 21-bit Unicode). So a console window can present almost all kinds of text.

Windows控制台窗口是纯 Unicode。它的缓冲区将文本存储为 UCS-2 Unicode(每个字符 16 位,本质上类似于原始 Unicode,对现代 21 位 Unicode的基本多语言平面的限制)。所以控制台窗口几乎可以显示所有类型的文本。

However, for single byte per character (and possibly also for some variable length encodings) i/o Windows automatically translates to/from the console window's active codepage. If the console window is a [cmd.exe] instance then you can inspect that via command chcp, short for change codepage. Like this:

但是,对于每个字符的单个字节(也可能对于某些可变长度编码),i/o Windows 会自动转换为/从控制台窗口的active codepage。如果控制台窗口是一个 [cmd.exe] 实例,那么您可以通过 command 来检查它chcp,它是change codepage 的缩写。像这样:

C:\test> chcp
Active code page: 850

C:\test> _

Codepage 850 is an encoding based on the original IBM PC English codepage 437. 850 is default for console windows on at least Norwegian PC's (although savvy Norwegians may change that to 865). None of those are codepages that you should use, however.

代码页 850 是基于原始 IBM PC 英语代码页 437 的编码。850 是至少挪威 PC 上控制台窗口的默认值(尽管精明的挪威人可能会将其更改为 865)。但是,这些都不是您应该使用的代码页。

The original IBM PC codepage (character encoding) is known as OEM, which is a meaningless acronym, Original Equipment Manufacturer. It had nice line drawing characters suitable for the original PC's text mode screen. More generally OEM means the default code page for console windows, where codepage 437 is just the original one: it can be configured, e.g. per window via chcp.

原始的 IBM PC 代码页(字符编码)被称为OEM,这是一个毫无意义的首字母缩写词,原始设备制造商。它具有适合原始 PC 的文本模式屏幕的漂亮线条绘制字符。更一般地说,OEM 意味着控制台窗口的默认代码页,其中代码页 437 只是原始代码页:它可以配置,例如每个窗口通过chcp.

When Microsoft created 16-bit Windows they chose another encoding known in Windows as ANSI. The original one was an extension of ISO Latin-1which for a long while was the default on the Internet (however, it's unclear which came first: Microsoft participated in the standardization). This original ANSI is now known as Windows ANSI Western.

当 Microsoft 创建 16 位 Windows 时,他们选择了另一种在 Windows 中称为ANSI 的编码。最初的版本是ISO Latin-1的扩展,它在很长一段时间内都是 Internet 上的默认设置(但是,尚不清楚哪个先出现:Microsoft 参与了标准化)。这个原始的 ANSI 现在被称为Windows ANSI Western

ANSI is the code page used for non-Unicode by almost all the rest of Windows. Console windows use OEM. Notepad, other editors, and so on, use ANSI.

ANSI 是几乎所有其余 Windows 用于非 Unicode 的代码页。控制台窗口使用 OEM。记事本、其他编辑器等使用 ANSI。

Then, when Microsoft made Windows 32-bit, they adopted a 16-bit extension of Latin-1 known as Unicode. Microsoft was an original founding member of the Unicode Consortium. And the basic API, including console windows, the file system, etc., was rewritten to use Unicode. For backward compatibility there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functionality. For example, MessageBoxAis an ANSI wrapper for Unicode-based MessageBoxW.

然后,当微软制造 32 位 Windows 时,他们采用了被称为Unicode的 Latin-1 的 16 位扩展。Microsoft 是 Unicode Consortium 的原始创始成员。包括控制台窗口、文件系统等在内的基本 API 被重写为使用 Unicode。为了向后兼容,有一个转换层可以在控制台窗口的 OEM 和 Unicode 之间转换,以及在其他功能的 ANSI 和 Unicode 之间转换。例如,MessageBoxA是基于 Unicode 的 ANSI 包装器MessageBoxW

The practical upshot of that is that in Windows your C++ source code is typically encoded with ANSI, while console windows assume OEM. Which e.g. makes

这样做的实际结果是,在 Windows 中,您的 C++ 源代码通常使用 ANSI 编码,而控制台窗口则采用 OEM。其中例如使

cout << "I like Norwegian bl?b?rsyltet?y!" << endl;

produce pure gobbledegook… You can use the Unicode-based console window APIs to output Unicode directly to a console window, avoiding the translation, but that's awkward.

生成纯粹的 gobbledegook……您可以使用基于 Unicode 的控制台窗口 API 将 Unicode 直接输出到控制台窗口,避免翻译,但这很尴尬。

Note that using wcoutinstead of coutdoesn't help: by design wcoutjust translates down from wide character strings to the program's narrow character set, discarding information on the way. It can be hard to believe, that the C++ standard library offers a rather big chunk of very very complex functionality that is meaningless (since instead those conversions could just have been supported by cout). But so it is, just meaningless. Possibly it was some political-like compromise, but anyway, wcoutdoes nothelp, even though if it were meaningful in some way then it "should" logically help with this.

请注意,使用wcout而不是cout没有帮助:按照设计wcout只是从宽字符串向下转换为程序的窄字符集,在此过程中丢弃信息。很难相信 C++ 标准库提供了相当大的非常复杂的功能块,这些功能毫无意义(因为这些转换可能只是由 支持cout)。但事实就是如此,只是毫无意义。也许这是一些类的妥协,但无论如何,wcout确实没有帮助,即使如果它在某种程度上有意义那么它“应该”在逻辑上的帮助与此有关。

So how does a Norwegian novice programmer get e.g. "bl?b?rsyltet?y" presented?

那么,挪威新手程序员如何得到例如“bl?b?rsyltet?y”的呈现?

Well, simply by changing the active code page to ANSI. Since on most Western country PCs ANSI is codepage 1252, you can do that for a given command interpreter instance by

好吧,只需将活动代码页更改为 ANSI。由于在大多数西方国家的 PC 上,ANSI 的代码页为 1252,因此您可以通过以下方式为给定的命令解释器实例执行此操作

C:\test> chcp 1252
Active code page: 1252

C:\test> _

Now old DOS programs like e.g. [edit.com] (still present in Windows XP!) will produce some gobbledegook, because the original PC character set line drawing characters are not there in ANSI, and because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!

现在像[edit.com]这样的老DOS程序(仍然存在于Windows XP中!)会产生一些gobbledegook,因为ANSI中没有原始PC字符集线描字符,并且因为国家字符在ANSI中有不同的代码。但是,嘿,谁使用旧的 DOS 程序?不是我!

If you want this as a more permanent code page, you'll have to change the configuration of console windows via an undocumented registry key:

如果您希望将其作为更永久的代码页,则必须通过未记录的注册表项更改控制台窗口的配置:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

In this key, change value of OEMCPto 1252, and reboot.

在此键中,将 的值更改OEMCP为 1252,然后重新启动

As with chcp, or other change of codepage to 1252, makes old DOS programs present gobbledegook, but makes C++ programs or other modern console programs work OK.

chcp或其他代码页更改为 1252 一样,使旧的 DOS 程序出现 gobbledegook,但使 C++ 程序或其他现代控制台程序正常工作。

Since you then have same character encoding in console windows as in the rest of Windows.

因为您在控制台窗口中具有与 Windows 其余部分相同的字符编码。

回答by John

When you print an ASCII string, Windows internally converts it to UNICODE based on the current code page. There is also a translation from UNICODE to "ASCII" done by the CRT. The following would work.

当您打印 ASCII 字符串时,Windows 会根据当前代码页在内部将其转换为 UNICODE。CRT 还完成了从 UNICODE 到“ASCII”的转换。以下将起作用。

#include <fcntl.h>
#include <io.h>
#include <stdio.h>
#include <iostream>

void
__cdecl
main(int ac, char **av)
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout  << L"\u00BF";
}

回答by Adam Rosenfield

Because the Win32 console uses code page 437(aka the OEM font) to render characters, whereas most of the rest of Windows uses Windows-1252for single-byte character codes.

因为 Win32 控制台使用代码页 437(又名 OEM 字体)来呈现字符,而大多数其余的 Windows 使用Windows-1252来表示单字节字符代码。

The character "?" is the Unicode character INVERTED QUESTION MARK, which has code point 0xBF (191 decimal) in Unicode, ISO 8859-1, and Windows-1252. The code point 0xBF in CP437 corresponds to the character "┐", which is BOX DRAWINGS LIGHT DOWN AND LEFT (code point U+2510).

人物 ”?” 是 Unicode 字符倒置问号,在 Unicode、ISO 8859-1 和 Windows-1252 中具有代码点 0xBF(十进制 191)。CP437中的码位0xBF对应字符“┐”,即BOX图LIGHT DOWN AND LEFT(码位U+2510)。

As long as you're using the Windows console, you can display only the characters in CP437 and no others. If you want to display other Unicode characters, you'll need to use a different environment.

只要您使用的是 Windows 控制台,您就只能显示 CP437 中的字符,而不能显示其他字符。如果要显示其他 Unicode 字符,则需要使用不同的环境。

回答by JK.

It is probably implemented using a basic ascii character set. Microsoft programmers didn't add utf-8 capability when creating the console. Just a guess since I wasn't a Microsoft programmer involved in creating the console.

它可能是使用基本的 ascii 字符集实现的。Microsoft 程序员在创建控制台时没有添加 utf-8 功能。只是猜测,因为我不是参与创建控制台的 Microsoft 程序员。