windows cmd.exe 使用什么编码/代码页?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1259084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What encoding/code page is cmd.exe using?
提问by danglund
When I open cmd.exe in Windows, what encoding is it using?
当我在 Windows 中打开 cmd.exe 时,它使用的是什么编码?
How can I check which encoding it is currently using? Does it depend on my regional setting or are there any environment variables to check?
如何检查当前使用的是哪种编码?这取决于我的区域设置还是有任何环境变量需要检查?
What happens when you type a file with a certain encoding? Sometimes I get garbled characters (incorrect encoding used) and sometimes it kind of works. However I don't trust anything as long as I don't know what's going on. Can anyone explain?
当您键入具有特定编码的文件时会发生什么?有时我会收到乱码(使用了不正确的编码),有时它会起作用。但是,只要我不知道发生了什么,我就不会相信任何事情。谁能解释一下?
回答by andrewdotn
Yes, it's frustrating—sometimes type
and other programs
print gibberish, and sometimes they do not.
是的,这令人沮丧——有时type
和其他程序会打印出乱码,有时则不会。
First of all, Unicode characters will only display if the current console font contains the characters. So use a TrueType font like Lucida Console instead of the default Raster Font.
首先,Unicode 字符只有在当前控制台字体包含字符时才会显示。所以使用像 Lucida Console 这样的 TrueType 字体而不是默认的光栅字体。
But if the console font doesn't contain the character you're trying to display, you'll see question marks instead of gibberish. When you get gibberish, there's more going on than just font settings.
但是如果控制台字体不包含您要显示的字符,您将看到问号而不是胡言乱语。当您遇到胡言乱语时,不仅仅是字体设置。
When programs use standard C-library I/O functions like printf
, the
program's output encoding must match the console's output encoding, or
you will get gibberish. chcp
shows and sets the current codepage. All
output using standard C-library I/O functions is treated as if it is in the
codepage displayed by chcp
.
当程序使用标准的 C-library I/O 函数时,如printf
,程序的输出编码必须与控制台的输出编码匹配,否则你会得到胡言乱语。chcp
显示和设置当前代码页。所有使用标准 C 库 I/O 函数的输出都被视为在由 显示的代码页中chcp
。
Matching the program's output encoding with the console's output encoding can be accomplished in two different ways:
可以通过两种不同的方式将程序的输出编码与控制台的输出编码匹配:
A program can get the console's current codepage using
chcp
orGetConsoleOutputCP
, and configure itself to output in that encoding, orYou or a program can set the console's current codepage using
chcp
orSetConsoleOutputCP
to match the default output encoding of the program.
程序可以使用
chcp
或 获取控制台的当前代码页GetConsoleOutputCP
,并将自身配置为以该编码输出,或您或程序可以使用
chcp
或SetConsoleOutputCP
匹配程序的默认输出编码来设置控制台的当前代码页。
However, programs that use Win32 APIs can write UTF-16LE strings directly
to the console with
WriteConsoleW
.
This is the only way to get correct output without setting codepages. And
even when using that function, if a string is not in the UTF-16LE encoding
to begin with, a Win32 program must pass the correct codepage to
MultiByteToWideChar
.
Also, WriteConsoleW
will not work if the program's output is redirected;
more fiddling is needed in that case.
但是,使用 Win32 API 的程序可以将 UTF-16LE 字符串直接写入控制台
WriteConsoleW
。这是在不设置代码页的情况下获得正确输出的唯一方法。即使在使用该函数时,如果字符串开始时不是 UTF-16LE 编码,则 Win32 程序必须将正确的代码页传递给
MultiByteToWideChar
. 此外,WriteConsoleW
如果程序的输出被重定向,将无法工作;在这种情况下需要更多的摆弄。
type
works some of the time because it checks the start of each file for
a UTF-16LE Byte Order Mark
(BOM), i.e. the bytes 0xFF 0xFE
.
If it finds such a
mark, it displays the Unicode characters in the file using WriteConsoleW
regardless of the current codepage. But when type
ing any file without a
UTF-16LE BOM, or for using non-ASCII characters with any command
that doesn't call WriteConsoleW
—you will need to set the
console codepage and program output encoding to match each other.
type
有时会起作用,因为它会检查每个文件的开头是否有 UTF-16LE字节顺序标记 (BOM),即字节0xFF 0xFE
。如果找到这样的标记,它会显示文件中的 Unicode 字符 using,WriteConsoleW
而不考虑当前的代码页。但是,当在type
没有 UTF-16LE BOM 的情况下处理任何文件时,或者在任何不调用的命令中使用非 ASCII 字符时,WriteConsoleW
您需要将控制台代码页和程序输出编码设置为彼此匹配。
How can we find this out?
我们如何才能发现这一点?
Here's a test file containing Unicode characters:
这是一个包含 Unicode 字符的测试文件:
ASCII abcde xyz
German ??ü ??ü ?
Polish ????ń?
Russian абвгдеж эюя
CJK 你好
Here's a Java program to print out the test file in a bunch of different
Unicode encodings. It could be in any programming language; it only prints
ASCII characters or encoded bytes to stdout
.
这是一个 Java 程序,用于以一堆不同的 Unicode 编码打印出测试文件。它可以是任何编程语言;它只将 ASCII 字符或编码字节打印到stdout
.
import java.io.*;
public class Foo {
private static final String BOM = "\ufeff";
private static final String TEST_STRING
= "ASCII abcde xyz\n"
+ "German ??ü ??ü ?\n"
+ "Polish ????ń?\n"
+ "Russian абвгдеж эюя\n"
+ "CJK 你好\n";
public static void main(String[] args)
throws Exception
{
String[] encodings = new String[] {
"UTF-8", "UTF-16LE", "UTF-16BE", "UTF-32LE", "UTF-32BE" };
for (String encoding: encodings) {
System.out.println("== " + encoding);
for (boolean writeBom: new Boolean[] {false, true}) {
System.out.println(writeBom ? "= bom" : "= no bom");
String output = (writeBom ? BOM : "") + TEST_STRING;
byte[] bytes = output.getBytes(encoding);
System.out.write(bytes);
FileOutputStream out = new FileOutputStream("uc-test-"
+ encoding + (writeBom ? "-bom.txt" : "-nobom.txt"));
out.write(bytes);
out.close();
}
}
}
}
The output in the default codepage? Total garbage!
默认代码页中的输出?完全垃圾!
Z:\andrew\projects\sx59084>chcp
Active code page: 850
Z:\andrew\projects\sx59084>java Foo
== UTF-8
= no bom
ASCII abcde xyz
German ├?├?├╝ ├?├?├£ ├?
Polish ─à─?┼║┼╝┼?┼é
Russian e?e?e▓e│e┤eáe? DìD?D?
CJK ?¢á??¢
= bom
′╗┐ASCII abcde xyz
German ├?├?├╝ ├?├?├£ ├?
Polish ─à─?┼║┼╝┼?┼é
Russian e?e?e▓e│e┤eáe? DìD?D?
CJK ?¢á??¢
== UTF-16LE
= no bom
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ??↓?z?|?D?B?
R u s s i a n 0?1?2?3?4?5?6? M?N?O?
C J K `O}Y
= bom
?■A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ??↓?z?|?D?B?
R u s s i a n 0?1?2?3?4?5?6? M?N?O?
C J K `O}Y
== UTF-16BE
= no bom
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ???↓?z?|?D?B
R u s s i a n ?0?1?2?3?4?5?6 ?M?N?O
C J K O`Y}
= bom
■? A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ???↓?z?|?D?B
R u s s i a n ?0?1?2?3?4?5?6 ?M?N?O
C J K O`Y}
== UTF-32LE
= no bom
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ↓? z? |? D? B?
R u s s i a n 0? 1? 2? 3? 4? 5? 6? M? N
? O?
C J K `O }Y
= bom
?■ A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ↓? z? |? D? B?
R u s s i a n 0? 1? 2? 3? 4? 5? 6? M? N
? O?
C J K `O }Y
== UTF-32BE
= no bom
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ?↓ ?z ?| ?D ?B
R u s s i a n ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?M ?N
?O
C J K O` Y}
= bom
■? A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ?↓ ?z ?| ?D ?B
R u s s i a n ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?M ?N
?O
C J K O` Y}
However, what if we type
the files that got saved? They contain the exact
same bytes that were printed to the console.
但是,如果我们type
保存了文件怎么办?它们包含打印到控制台的完全相同的字节。
Z:\andrew\projects\sx59084>type *.txt
uc-test-UTF-16BE-bom.txt
■? A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ???↓?z?|?D?B
R u s s i a n ?0?1?2?3?4?5?6 ?M?N?O
C J K O`Y}
uc-test-UTF-16BE-nobom.txt
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ???↓?z?|?D?B
R u s s i a n ?0?1?2?3?4?5?6 ?M?N?O
C J K O`Y}
uc-test-UTF-16LE-bom.txt
ASCII abcde xyz
German ??ü ??ü ?
Polish ????ń?
Russian абвгдеж эюя
CJK 你好
uc-test-UTF-16LE-nobom.txt
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ??↓?z?|?D?B?
R u s s i a n 0?1?2?3?4?5?6? M?N?O?
C J K `O}Y
uc-test-UTF-32BE-bom.txt
■? A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ?↓ ?z ?| ?D ?B
R u s s i a n ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?M ?N
?O
C J K O` Y}
uc-test-UTF-32BE-nobom.txt
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ?↓ ?z ?| ?D ?B
R u s s i a n ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?M ?N
?O
C J K O` Y}
uc-test-UTF-32LE-bom.txt
A S C I I a b c d e x y z
G e r m a n ? ? ü ? ? ü ?
P o l i s h ? ? ? ? ń ?
R u s s i a n а б в г д е ж э ю я
C J K 你 好
uc-test-UTF-32LE-nobom.txt
A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ?? ↓? z? |? D? B?
R u s s i a n 0? 1? 2? 3? 4? 5? 6? M? N
? O?
C J K `O }Y
uc-test-UTF-8-bom.txt
′╗┐ASCII abcde xyz
German ├?├?├╝ ├?├?├£ ├?
Polish ─à─?┼║┼╝┼?┼é
Russian e?e?e▓e│e┤eáe? DìD?D?
CJK ?¢á??¢
uc-test-UTF-8-nobom.txt
ASCII abcde xyz
German ├?├?├╝ ├?├?├£ ├?
Polish ─à─?┼║┼╝┼?┼é
Russian e?e?e▓e│e┤eáe? DìD?D?
CJK ?¢á??¢
The onlything that works is UTF-16LE file, with a BOM, printed to the
console via type
.
的唯一一件事情就是作品UTF-16LE文件,以BOM,打印到通过控制台type
。
If we use anything other than type
to print the file, we get garbage:
如果我们使用除type
打印文件以外的任何其他内容,我们会得到垃圾:
Z:\andrew\projects\sx59084>copy uc-test-UTF-16LE-bom.txt CON
?■A S C I I a b c d e x y z
G e r m a n ? ÷ 3 ─ í ▄ ?
P o l i s h ??↓?z?|?D?B?
R u s s i a n 0?1?2?3?4?5?6? M?N?O?
C J K `O}Y
1 file(s) copied.
From the fact that copy CON
does not display Unicode correctly, we can
conclude that the type
command has logic to detect a UTF-16LE BOM at the
start of the file, and use special Windows APIs to print it.
从copy CON
不能正确显示 Unicode的事实来看,我们可以得出结论,该type
命令具有检测文件开头的 UTF-16LE BOM 并使用特殊的 Windows API 打印它的逻辑。
We can see this by opening cmd.exe
in a debugger when it goes to type
out a file:
cmd.exe
当它type
输出文件时,我们可以通过在调试器中打开来看到这一点:
After type
opens a file, it checks for a BOM of 0xFEFF
—i.e., the bytes
0xFF 0xFE
in little-endian—and if there is such a BOM, type
sets an
internal fOutputUnicode
flag. This flag is checked later to decide
whether to call WriteConsoleW
.
之后type
打开一个文件时,它检查的BOM 0xFEFF
-即字节
0xFF 0xFE
的小端,如果有这样的BOM,type
设置内部fOutputUnicode
标志。稍后检查此标志以决定是否调用WriteConsoleW
.
But that's the only way to get type
to output Unicode, and only for files
that have BOMs and are in UTF-16LE. For all other files, and for programs
that don't have special code to handle console output, your files will be
interpreted according to the current codepage, and will likely show up as
gibberish.
但这是type
输出 Unicode的唯一方法,并且仅适用于具有 BOM 且采用 UTF-16LE 格式的文件。对于所有其他文件,以及没有特殊代码来处理控制台输出的程序,您的文件将根据当前代码页进行解释,并且可能会显示为乱码。
You can emulate how type
outputs Unicode to the console in your own programs like so:
您可以type
在自己的程序中模拟如何将Unicode 输出到控制台,如下所示:
#include <stdio.h>
#define UNICODE
#include <windows.h>
static LPCSTR lpcsTest =
"ASCII abcde xyz\n"
"German ??ü ??ü ?\n"
"Polish ????ń?\n"
"Russian абвгдеж эюя\n"
"CJK 你好\n";
int main() {
int n;
wchar_t buf[1024];
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
n = MultiByteToWideChar(CP_UTF8, 0,
lpcsTest, strlen(lpcsTest),
buf, sizeof(buf));
WriteConsole(hConsole, buf, n, &n, NULL);
return 0;
}
This program works for printing Unicode on the Windows console using the default codepage.
该程序用于使用默认代码页在 Windows 控制台上打印 Unicode。
For the sample Java program, we can get a little bit of correct output by setting the codepage manually, though the output gets messed up in weird ways:
对于示例 Java 程序,我们可以通过手动设置代码页来获得一些正确的输出,尽管输出以奇怪的方式混乱:
Z:\andrew\projects\sx59084>chcp 65001
Active code page: 65001
Z:\andrew\projects\sx59084>java Foo
== UTF-8
= no bom
ASCII abcde xyz
German ??ü ??ü ?
Polish ????ń?
Russian абвгдеж эюя
CJK 你好
ж эюя
CJK 你好
你好
好
?
= bom
ASCII abcde xyz
German ??ü ??ü ?
Polish ????ń?
Russian абвгдеж эюя
CJK 你好
еж эюя
CJK 你好
你好
好
?
== UTF-16LE
= no bom
A S C I I a b c d e x y z
…
However, a C program that sets a Unicode UTF-8 codepage:
但是,设置 Unicode UTF-8 代码页的 C 程序:
#include <stdio.h>
#include <windows.h>
int main() {
int c, n;
UINT oldCodePage;
char buf[1024];
oldCodePage = GetConsoleOutputCP();
if (!SetConsoleOutputCP(65001)) {
printf("error\n");
}
freopen("uc-test-UTF-8-nobom.txt", "rb", stdin);
n = fread(buf, sizeof(buf[0]), sizeof(buf), stdin);
fwrite(buf, sizeof(buf[0]), n, stdout);
SetConsoleOutputCP(oldCodePage);
return 0;
}
does have correct output:
确实有正确的输出:
Z:\andrew\projects\sx59084>.\test
ASCII abcde xyz
German ??ü ??ü ?
Polish ????ń?
Russian абвгдеж эюя
CJK 你好
The moral of the story?
这个故事的主旨?
type
can print UTF-16LE files with a BOM regardless of your current codepage- Win32 programs can be programmed to output Unicode to the console, using
WriteConsoleW
. - Other programs which set the codepage and adjust their output encoding accordingly can print Unicode on the console regardless of what the codepage was when the program started
- For everything else you will have to mess around with
chcp
, and will probably still get weird output.
type
无论您当前的代码页如何,都可以打印带有 BOM 的 UTF-16LE 文件- 可以对 Win32 程序进行编程以将 Unicode 输出到控制台,使用
WriteConsoleW
. - 其他设置代码页并相应调整其输出编码的程序可以在控制台上打印 Unicode,而不管程序启动时的代码页是什么
- 对于其他所有事情,您将不得不使用
chcp
,并且可能仍然会得到奇怪的输出。
回答by Cagdas Altinkaya
Type
类型
chcp
to see your current code page (as Dewfy already said).
查看您当前的代码页(正如 Dewfy 已经说过的)。
Use
用
nlsinfo
to see all installed code pages and find out what your code page number means.
查看所有已安装的代码页并找出您的代码页编号的含义。
You need to have Windows Server 2003 Resource kit installed (works on Windows XP) to use nlsinfo
.
您需要安装 Windows Server 2003 资源工具包(适用于 Windows XP)才能使用nlsinfo
.
回答by Brian Agnew
To answer your second query re. how encoding works, Joel Spolsky wrote a great introductory article on this. Strongly recommended.
回答by Dewfy
Command CHCP shows the current codepage. It has three digits: 8xx and is different from Windows 12xx. So typing a English-only text you wouldn't see any difference, but an extended codepage (like Cyrillic) will be printed wrongly.
命令 CHCP 显示当前代码页。它有三个数字:8xx,与 Windows 12xx 不同。因此,键入纯英文文本您不会看到任何区别,但会错误地打印扩展代码页(如西里尔文)。
回答by Jean-Fran?ois Larvtheitroade
I've been frustrated for long by Windows code page issues, and the C programs portability and localisation issues they cause. The previous posts have detailed the issues at length, so I'm not going to add anything in this respect.
长期以来,我一直对 Windows 代码页问题以及它们导致的 C 程序可移植性和本地化问题感到沮丧。之前的帖子已经详细介绍了这些问题,所以我不打算在这方面添加任何内容。
To make a long story short, eventually I ended up writing my own UTF-8 compatibility library layer over the Visual C++ standard C library. Basically this library ensures that a standard C program works right, in any code page, using UTF-8 internally.
长话短说,最终我在 Visual C++ 标准 C 库上编写了自己的 UTF-8 兼容库层。基本上,这个库确保标准 C 程序在任何代码页中都能正常工作,在内部使用 UTF-8。
This library, called MsvcLibX, is available as open source at https://github.com/JFLarvtheitroade/SysToolsLib. Main features:
这个名为 MsvcLibX 的库在https://github.com/JFLarvtheitroade/SysToolsLib 上作为开源提供。主要特点:
- C sources encoded in UTF-8, using normal char[] C strings, and standard C library APIs.
- In any code page, everything is processed internally as UTF-8 in your code, including the main() routine argv[], with standard input and output automatically converted to the right code page.
- All stdio.h file functions support UTF-8 pathnames > 260 characters, up to 64 KBytes actually.
- The same sources can compile and link successfully in Windows using Visual C++ and MsvcLibX and Visual C++ C library, and in Linux using gcc and Linux standard C library, with no need for #ifdef ... #endif blocks.
- Adds include files common in Linux, but missing in Visual C++. Ex: unistd.h
- Adds missing functions, like those for directory I/O, symbolic link management, etc, all with UTF-8 support of course :-).
- C 源代码以 UTF-8 编码,使用普通 char[] C 字符串和标准 C 库 API。
- 在任何代码页中,所有内容都在代码中作为 UTF-8 进行内部处理,包括 main() 例程 argv[],标准输入和输出会自动转换为正确的代码页。
- 所有 stdio.h 文件函数都支持 UTF-8 路径名 > 260 个字符,实际上最多 64 KB。
- 相同的源可以在 Windows 中使用 Visual C++ 和 MsvcLibX 和 Visual C++ C 库成功编译和链接,在 Linux 中使用 gcc 和 Linux 标准 C 库,无需 #ifdef ... #endif 块。
- 添加在 Linux 中常见但在 Visual C++ 中缺失的包含文件。例如: unistd.h
- 添加缺少的功能,例如用于目录 I/O、符号链接管理等的功能,当然所有这些功能都支持 UTF-8 :-)。
More details in the MsvcLibX README on GitHub, including how to build the library and use it in your own programs.
GitHub 上的MsvcLibX README中的更多详细信息,包括如何构建库并在您自己的程序中使用它。
The release sectionin the above GitHub repository provides several programs using this MsvcLibX library, that will show its capabilities. Ex: Try my which.exe tool with directories with non-ASCII names in the PATH, searching for programs with non-ASCII names, and changing code pages.
上述 GitHub 存储库中的发布部分提供了几个使用此 MsvcLibX 库的程序,这些程序将展示其功能。例如:尝试使用我的 which.exe 工具在 PATH 中使用非 ASCII 名称的目录,搜索具有非 ASCII 名称的程序,并更改代码页。
Another useful tool there is the conv.exe program. This program can easily convert a data stream from any code page to any other. Its default is input in the Windows code page, and output in the current console code page. This allows to correctly view data generated by Windows GUI apps (ex: Notepad) in a command console, with a simple command like: type WINFILE.txt | conv
另一个有用的工具是 conv.exe 程序。该程序可以轻松地将数据流从任何代码页转换为任何其他代码页。它的默认值是在 Windows 代码页中输入,在当前控制台代码页中输出。这允许使用简单的命令在命令控制台中正确查看 Windows GUI 应用程序(例如:记事本)生成的数据,例如:type WINFILE.txt | conv
This MsvcLibX library is by no means complete, and contributions for improving it are welcome!
这个 MsvcLibX 库绝不是完整的,欢迎为改进它做出贡献!
回答by Neumi
In Java I used encoding "IBM850" to write the file. That solved the problem.
在 Java 中,我使用编码“IBM850”来编写文件。那解决了问题。