windows 批处理文件编码

Question

提问by shodanex

I would like to deal with filename containing strange characters, like the French é.

我想处理包含奇怪字符的文件名，比如法语 é。

Everything is working fine in the shell:

在 shell 中一切正常：

C:\somedir\>ren -hélice hélice

I know if I put this line in a .bat file, I obtain the following result:

我知道如果我将这一行放在 .bat 文件中，我会得到以下结果：

C:\somedir\>ren -húlice húlice

See ? é have been replaced by ú.

看？é 已被 ú 取代。

The same is true for command output. If I dirsome directory in the shell, the output is fine. If I redirect this output to a file, some characters are transformed.

命令输出也是如此。如果我dir在 shell 中的某个目录，输出很好。如果我将此输出重定向到文件，则会转换某些字符。

So how can I tell cmd.exe how to interpret what appears as an é in my batch file, is really an é and not a ú or a comma?

那么我如何告诉 cmd.exe 如何解释在我的批处理文件中显示为 é 的内容，实际上是 é 而不是 ú 或逗号？

So there is no way when executing a .bat file to give an hint about the codepage in which it was written?

那么在执行 .bat 文件时没有办法给出有关编写它的代码页的提示吗？

Answer 1

回答by Joey

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.

您必须使用 OEM 编码保存批处理文件。如何执行此操作取决于您的文本编辑器。在这种情况下使用的编码也有所不同。对于西方文化，它通常是 CP850。

Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).

批处理文件和编码实际上是两件不太喜欢的事情。不幸的是，您会注意到 Unicode 也不可能在那里使用（即使环境变量可以很好地处理它）。

Alternatively, you can set the console to use another codepage:

或者，您可以将控制台设置为使用另一个代码页：

chcp 1252

should do the trick. At least it worked for me here.

应该做的伎俩。至少它在这里对我有用。

When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /uswitch to cmd.exeto force Unicode output redirection, which causes the resulting files to be in UTF-16.

当您执行输出重定向时，例如 with dir，同样的规则适用。使用控制台窗口的代码页。您可以使用/u开关来cmd.exe强制 Unicode 输出重定向，这会导致生成的文件为 UTF-16。

As for encodings and code pages in cmd.exein general, also see this question:

至于一般的编码和代码页cmd.exe，另见这个问题：

What encoding/code page is cmd.exe using

cmd.exe 使用什么编码/代码页

EDIT:As for your edit: No, cmdalways assumes the batch file to be written in the console default codepage. However, you can easily include a chcpat the start of the batch:

编辑：至于您的编辑：不，cmd始终假定批处理文件要写入控制台默认代码页。但是，您可以轻松地chcp在批处理的开头包含一个：

chcp 1252>NUL
ren -hélice hélice

To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:

为了在直接从命令行使用时使其更加健壮，您可能需要记住旧代码页并在之后恢复它：

@echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul

Answer 2

回答by David Pontbriand

I created the following block, which I put at the beginning of my batch files:

我创建了以下块，将其放在批处理文件的开头：

set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
    rem Converting code page from 1252 to 850.
    rem My editors use 1252, my batch uses 850.
    rem We create a converted -850.bat file, and then launch it.
    set File850=%~n0-850.bat
    PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
    call %File850%
    del %File850%
    EXIT /b 0
:CONVERT_CODEPAGE_END

Answer 3

回答by dconman

I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.

我遇到了这个问题，这是我找到的解决方案。在当前代码页中查找要查找的字符的十进制数。

For example, I'm in codepage 437 (chcptells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437tells me that the degree sign is number 248.

例如，我在代码页 437（chcp告诉你）中，我想要一个学位符号 . http://en.wikipedia.org/wiki/Code_page_437告诉我度数符号是 248。

Then you find the Unicode character with the same number.

然后您找到具有相同编号的 Unicode 字符。

The Unicode character at 248 (U+00F8) is .

248 (U+00F8) 处的 Unicode 字符是 .

If you insert the Unicode character in your batch script, it will display to the console as the character you desire.

如果您在批处理脚本中插入 Unicode 字符，它将作为您想要的字符显示在控制台上。

So my batch file

所以我的批处理文件

echo

prints

印刷

°

Answer 4

回答by g.cze

I care about three concepts:

我关心三个概念：

Output Console Encoding
Command line internal encoding (that changed with chcp)
.bat Text Encoding

输出控制台编码
命令行内部编码（随 chcp 更改）
.bat 文本编码

The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu Encoding→ Character sets→ Western European→ OEM 850).

对我来说最简单的场景：我将在相同的编码中提到前两个，比如 CP850，我将我的 .bat 存储在相同的编码中（在 Notepad++ 中，菜单编码→字符集→西欧→ OEM 850）。

But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character sets→ Western European→ Windows-1252)

但是假设有人给我一个另一种编码的 .bat 文件，比如 CP1252（在 Notepad++ 中，菜单 Encoding* → Character sets→ Western European→ Windows-1252）

Then I would change the command line internal encoding, with chcp 1252.

然后我将使用 chcp 1252 更改命令行内部编码。

This changes the encoding it uses to talk with other processes, neither the input device nor output console.

这改变了它用来与其他进程对话的编码，既不是输入设备也不是输出控制台。

So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is ú).

因此，我的命令行实例将通过其 STDOUT 文件描述符有效地发送 1252 中的字符，但是当控制台将它们解码为 850（é 是 ú）时，会出现乱码文本。

Then I modify the file as follows:

然后我修改文件如下：

@echo off

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice

First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."

首先，我关闭 echo 以便命令不会输出，除非明确执行 echo... 或 perl -e "print..."

Then I put this boilerplate each time I need to output something

然后每次我需要输出一些东西时我都会放这个样板

perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

perl -e "使用编码 qw/编码解码/;" -e "打印编码('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

I substitute the actual text I'll show for this: ren -hélice hélice.

我替换了我将显示的实际文本：ren -hélice hélice。

And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.

而且我可能需要将我的控制台编码替换为 cp850，并将其他端编码替换为 cp1252。

And just below I put the desired command.

就在下面，我放了所需的命令。

I did broke the problematic line into the output half and the real command half.

我确实将有问题的行分为输出部分和实际命令部分。

The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.
The second, the real command (muttered with @echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.

我确定的第一个：“é”通过转码被解释为“é”。由于控制台和文件的编码不同，因此所有输出句子都需要。
第二个，真正的命令（在关闭@echo 的情况下喃喃自语），知道我们从 chcp 和 .bat 文本中获得相同的编码就足以确保正确的字符解释。

Answer 5

回答by michal

I had polish signs inside the code in R (eg. ?, ?, ?, ? etc.) and had the problem while running this R script with .bat file(in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).

我在 R 中的代码中有波兰语符号（例如？，？，？，？等）并且在使用 .bat 文件运行这个 R 脚本时遇到了问题（在输出文件 .Rout 中，而不是那些符号，有类似的迹象%、&、# 等，并且代码没有运行到最后）。

My solution:

我的解决方案：

Save R script with encoding: File > Save with encoding > CP1250
Run .bat file

用编码保存 R 脚本：文件 > 用编码保存 > CP1250
运行 .bat 文件

It worked for me but if there is still the problem, try to use the other encodings.

它对我有用，但如果问题仍然存在，请尝试使用其他编码。

windows 批处理文件编码

提问by shodanex

回答by Joey

回答by David Pontbriand

回答by dconman

回答by g.cze

回答by michal

相关推荐

最近更新

标签

windows 批处理文件编码

提问by shodanex

回答by Joey

回答by David Pontbriand

回答by dconman

回答by g.cze

回答by michal

相关推荐

如何遍历 Windows 批处理文件中的文件夹树/子树？

如何在 VSCode 中的 TypeScript 构建期间忽略 `node_modules` 文件夹

windows 是否有任何无效的 linux 文件名？

TypeScript 使用实例访问静态变量

相关推荐

最近更新

标签