如何指定与底层 Windows 代码页一致的 Java file.encoding 值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1336930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you specify a Java file.encoding value consistent with the underlying Windows code page?
提问by Rob Kennedy
I have a Java application that receives data over a socket using an InputStreamReader
. It reports "Cp1252" from its getEncoding
method:
我有一个 Java 应用程序,它使用InputStreamReader
. 它从其getEncoding
方法报告“Cp1252” :
/* java.net. */ Socket Sock = ...;
InputStreamReader is = new InputStreamReader(Sock.getInputStream());
System.out.println("Character encoding = " + is.getEncoding());
// Prints "Character encoding = Cp1252"
That doesn't necessarily match what the system reports as its code page. For example:
这不一定与系统报告为其代码页的内容相匹配。例如:
C:\>chcp Active code page: 850
The application may receive byte 0x81, which in code page 850 represents the character ü
. The program interprets that byte with code page 1252, which doesn't define any character at that value, so I get a question mark instead.
应用程序可能会收到字节 0x81,它在代码页 850 中表示字符ü
。程序用代码页 1252 解释该字节,它没有在该值定义任何字符,所以我得到一个问号。
I was able to work around this problem for one customer who used code page 850 by adding another command-line option in the batch file that launches the application:
通过在启动应用程序的批处理文件中添加另一个命令行选项,我能够为一位使用代码页 850 的客户解决此问题:
java.exe -Dfile.encoding=Cp850 ...
But not all my customers use code page 850, of course. How can I get Java to use a code page that's compatible with the underlying Windows system? My preference would be something I could just put in the batch file, leaving the Java code untouched:
但当然,并非我的所有客户都使用代码页 850。如何让 Java 使用与底层 Windows 系统兼容的代码页?我的偏好是我可以将其放入批处理文件中,而保持 Java 代码不变:
ENC=... java.exe -Dfile.encoding=%ENC% ...
回答by McDowell
The default encoding used by cmd.exe
is Cp850
(or whatever "OEM" CP is native to the OS); the system encoding is Cp1252
(or whatever "ANSI" CP is native to the OS). Gory details here. One way to discover the console encoding would be to do it via native code(see GetConsoleOutputCPfor current console encoding; see GetACPfor default "ANSI" encoding; etc.).
使用的默认编码cmd.exe
是Cp850
(或操作系统原生的任何“OEM”CP);系统编码是Cp1252
(或操作系统原生的任何“ANSI”CP)。血腥细节在这里。发现控制台编码的一种方法是通过本机代码(请参阅GetConsoleOutputCP了解当前控制台编码;参阅GetACP了解默认“ANSI”编码;等等)。
Altering the encoding via the -D
switch is going to affect all your default encoding mechanisms, including redirected stdout/stdin/stderr. It is not an ideal solution.
通过-D
开关更改编码将影响所有默认编码机制,包括重定向的 stdout/stdin/stderr。这不是一个理想的解决方案。
I came up with this WSH script that can set the console to the system ANSI codepage, but haven't figured out how to programmatically switch to a TrueType font.
我想出了这个 WSH 脚本,它可以将控制台设置为系统 ANSI 代码页,但还没有弄清楚如何以编程方式切换到 TrueType 字体。
'file: setacp.vbs
'usage: cscript /Nologo setacp.vbs
Set objShell = CreateObject("WScript.Shell")
'replace ACP (ANSI) with OEMCP for default console CP
cp = objShell.RegRead("HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001" &_
"\Control\Nls\CodePage\ACP")
WScript.Echo "Switching console code page to " & cp
objShell.Exec "chcp.com " & cp
(This is my first WSH script, so it may be flawed - I'm not familiar with registry read permissions.)
(这是我的第一个 WSH 脚本,所以它可能有缺陷 - 我不熟悉注册表读取权限。)
Using a TrueType font is another requirement for using ANSI/Unicode with cmd.exe
. I'm going to look at a programmatic switch to a better font when time permits.
使用 TrueType 字体是将 ANSI/Unicode 与cmd.exe
. 如果时间允许,我将考虑以编程方式切换到更好的字体。
回答by Yishai
In regards to the code snippit, the right answer is to use the appropriate constructorfor InputStreamReader that does the correct code conversion. That way it won't matter what encoding the default on the system is, you know you are getting a correct encoding that corresponds to what you are getting on the socket.
关于代码片段,正确的答案是为 InputStreamReader使用适当的构造函数来进行正确的代码转换。这样,系统上的默认编码是什么并不重要,您知道您正在获得与您在套接字上获得的编码相对应的正确编码。
Then you can specify the encoding when you write out files if you need to, rather than relying on the system encoding, but of course when they open files on that system they may have issues, but modern windows systems support UTF-8, so you can write out the file in UTF-8 if you need to (internally Java is representing all Strings as 16 bit unicode).
然后,如果需要,您可以在写出文件时指定编码,而不是依赖于系统编码,但是当然,当他们在该系统上打开文件时,它们可能会出现问题,但是现代 Windows 系统支持 UTF-8,因此您如果需要,可以用 UTF-8 写出文件(Java 在内部将所有字符串表示为 16 位 unicode)。
I would think this is the "right" solution in general that would be most compatible with largest range of underlying systems.
我认为这通常是与最大范围的底层系统最兼容的“正确”解决方案。
回答by ferdley
Windows has the added complication of having two active codepages. In your example both 1252 and 850 are correct, but they depend on the way the program is being run. For GUI applications, Windows will use the ANSI code page, which for Western European languages will typically be 1252. However, the command line will report the OEM codepage which is 850 for the same locales.
Windows 具有两个活动代码页的额外复杂性。在您的示例中,1252 和 850 都是正确的,但它们取决于程序运行的方式。对于 GUI 应用程序,Windows 将使用 ANSI 代码页,对于西欧语言通常为 1252。但是,命令行将报告 OEM 代码页,对于相同的区域设置为 850。
回答by GregA100k
If the code page value that comes back from a chcp command will return the value that you need, you can use the following command to get the code page
如果从 chcp 命令返回的代码页值将返回您需要的值,则可以使用以下命令获取代码页
C:\>for /F "Tokens=4" %I in ('chcp') Do Set CodePage=%I
This sets the variable CodePage to the code page value returned from chcp
这将变量 CodePage 设置为从 chcp 返回的代码页值
C:\>echo %CodePage%
437
You could use this value in your bat file by prefixing it with Cp
你可以在你的 bat 文件中使用这个值,方法是在它前面加上 Cp
C:\>echo Cp%CodePage%
Cp437
If when you put this into a bat file, the %I values in the first command will need to be replaced with %%I
如果将其放入 bat 文件时,第一个命令中的 %I 值将需要替换为 %%I