windows 批处理文件编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1427796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 06:43:36  来源:igfitidea点击:

Batch file encoding

windowsencodingbatch-filecmd

提问by shodanex

I would like to deal with filename containing strange characters, like the French é.

我想处理包含奇怪字符的文件名,比如法语 é。

Everything is working fine in the shell:

在 shell 中一切正常:

C:\somedir\>ren -hélice hélice

I know if I put this line in a .bat file, I obtain the following result:

我知道如果我将这一行放在 .bat 文件中,我会得到以下结果:

C:\somedir\>ren -húlice húlice

See ? é have been replaced by ú.

看 ?é 已被 ú 取代。

The same is true for command output. If I dirsome directory in the shell, the output is fine. If I redirect this output to a file, some characters are transformed.

命令输出也是如此。如果我dir在 shell 中的某个目录,输出很好。如果我将此输出重定向到文件,则会转换某些字符。

So how can I tell cmd.exe how to interpret what appears as an é in my batch file, is really an é and not a ú or a comma?

那么我如何告诉 cmd.exe 如何解释在我的批处理文件中显示为 é 的内容,实际上是 é 而不是 ú 或逗号?

So there is no way when executing a .bat file to give an hint about the codepage in which it was written?

那么在执行 .bat 文件时没有办法给出有关编写它的代码页的提示吗?

回答by Joey

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.

您必须使用 OEM 编码保存批处理文件。如何执行此操作取决于您的文本编辑器。在这种情况下使用的编码也有所不同。对于西方文化,它通常是 CP850。

Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).

批处理文件和编码实际上是两件不太喜欢的事情。不幸的是,您会注意到 Unicode 也不可能在那里使用(即使环境变量可以很好地处理它)。

Alternatively, you can set the console to use another codepage:

或者,您可以将控制台设置为使用另一个代码页:

chcp 1252

should do the trick. At least it worked for me here.

应该做的伎俩。至少它在这里对我有用。

When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /uswitch to cmd.exeto force Unicode output redirection, which causes the resulting files to be in UTF-16.

当您执行输出重定向时,例如 with dir,同样的规则适用。使用控制台窗口的代码页。您可以使用/u开关来cmd.exe强制 Unicode 输出重定向,这会导致生成的文件为 UTF-16。

As for encodings and code pages in cmd.exein general, also see this question:

至于一般的编码和代码页cmd.exe,另见这个问题:

EDIT:As for your edit: No, cmdalways assumes the batch file to be written in the console default codepage. However, you can easily include a chcpat the start of the batch:

编辑:至于您的编辑:不,cmd始终假定批处理文件要写入控制台默认代码页。但是,您可以轻松地chcp在批处理的开头包含一个:

chcp 1252>NUL
ren -hélice hélice

To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:

为了在直接从命令行使用时使其更加健壮,您可能需要记住旧代码页并在之后恢复它:

@echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul

回答by David Pontbriand

I created the following block, which I put at the beginning of my batch files:

我创建了以下块,将其放在批处理文件的开头:

set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
    rem Converting code page from 1252 to 850.
    rem My editors use 1252, my batch uses 850.
    rem We create a converted -850.bat file, and then launch it.
    set File850=%~n0-850.bat
    PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
    call %File850%
    del %File850%
    EXIT /b 0
:CONVERT_CODEPAGE_END

回答by dconman

I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.

我遇到了这个问题,这是我找到的解决方案。在当前代码页中查找要查找的字符的十进制数。

For example, I'm in codepage 437 (chcptells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437tells me that the degree sign is number 248.

例如,我在代码页 437(chcp告诉你)中,我想要一个学位符号 . http://en.wikipedia.org/wiki/Code_page_437告诉我度数符号是 248。

Then you find the Unicode character with the same number.

然后您找到具有相同编号的 Unicode 字符。

The Unicode character at 248 (U+00F8) is .

248 (U+00F8) 处的 Unicode 字符是 .

If you insert the Unicode character in your batch script, it will display to the console as the character you desire.

如果您在批处理脚本中插入 Unicode 字符,它将作为您想要的字符显示在控制台上。

So my batch file

所以我的批处理文件

echo

prints

印刷

°

回答by g.cze

I care about three concepts:

我关心三个概念:

  1. Output Console Encoding

  2. Command line internal encoding (that changed with chcp)

  3. .bat Text Encoding

  1. 输出控制台编码

  2. 命令行内部编码(随 chcp 更改)

  3. .bat 文本编码

The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu EncodingCharacter setsWestern EuropeanOEM 850).

对我来说最简单的场景:我将在相同的编码中提到前两个,比如 CP850,我将我的 .bat 存储在相同的编码中(在 Notepad++ 中,菜单编码字符集西欧OEM 850)。

But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character setsWestern EuropeanWindows-1252)

但是假设有人给我一个另一种编码的 .bat 文件,比如 CP1252(在 Notepad++ 中,菜单 Encoding* → Character setsWestern EuropeanWindows-1252

Then I would change the command line internal encoding, with chcp 1252.

然后我将使用 chcp 1252 更改命令行内部编码。

This changes the encoding it uses to talk with other processes, neither the input device nor output console.

这改变了它用来与其他进程对话的编码,既不是输入设备也不是输出控制台。

So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is ú).

因此,我的命令行实例将通过其 STDOUT 文件描述符有效地发送 1252 中的字符,但是当控制台将它们解码为 850(é 是 ú)时,会出现乱码文本。

Then I modify the file as follows:

然后我修改文件如下:

@echo off

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice

First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."

首先,我关闭 echo 以便命令不会输出,除非明确执行 echo... 或 perl -e "print..."

Then I put this boilerplate each time I need to output something

然后每次我需要输出一些东西时我都会放这个样板

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

perl -e "使用编码 qw/编码解码/;" -e "打印编码('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

I substitute the actual text I'll show for this: ren -hélice hélice.

我替换了我将显示的实际文本:ren -hélice hélice。

And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.

而且我可能需要将我的控制台编码替换为 cp850,并将其他端编码替换为 cp1252。

And just below I put the desired command.

就在下面,我放了所需的命令。

I did broke the problematic line into the output half and the real command half.

我确实将有问题的行分为输出部分和实际命令部分。

  • The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.

  • The second, the real command (muttered with @echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.

  • 我确定的第一个:“é”通过转码被解释为“é”。由于控制台和文件的编码不同,因此所有输出句子都需要。

  • 第二个,真正的命令(在关闭@echo 的情况下喃喃自语),知道我们从 chcp 和 .bat 文本中获得相同的编码就足以确保正确的字符解释。

回答by michal

I had polish signs inside the code in R (eg. ?, ?, ?, ? etc.) and had the problem while running this R script with .bat file(in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).

我在 R 中的代码中有波兰语符号(例如?,?,?,?等)并且在使用 .bat 文件运行这个 R 脚本时遇到了问题(在输出文件 .Rout 中,而不是那些符号,有类似的迹象%、&、# 等,并且代码没有运行到最后)。

My solution:

我的解决方案:

  1. Save R script with encoding: File > Save with encoding > CP1250
  2. Run .bat file
  1. 用编码保存 R 脚本:文件 > 用编码保存 > CP1250
  2. 运行 .bat 文件

It worked for me but if there is still the problem, try to use the other encodings.

它对我有用,但如果问题仍然存在,请尝试使用其他编码。