从 Scala 解释器打印 Unicode
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1948044/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Printing Unicode from Scala interpreter
提问by Martin Sturm
When using the scala interpreter (i.e. running the command 'scala' on the commandline), I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example or ? is printed as a ?.
使用 scala 解释器(即在命令行上运行命令“scala”)时,我无法正确打印 unicode 字符。当然 az、AZ 等打印正确,但例如 或 ? 打印为 ?。
print(8364.toChar)
results in ? instead of . Probably I'm doing something wrong. My terminal supports utf-8 characters and even when I pipe the output to a seperate file and open it in a texteditor, ? is displayed.
结果是 ?代替 。可能我做错了什么。我的终端支持 utf-8 字符,即使我将输出通过管道传输到一个单独的文件并在文本编辑器中打开它,?被陈列。
This is all happening on Mac OS X (Snow Leopard, 10.6.2) with Scala 2.8 (nightly build) and Java 1.6.0_17)
这一切都发生在 Mac OS X(Snow Leopard,10.6.2)和 Scala 2.8(每晚构建)和 Java 1.6.0_17)
采纳答案by Martin Sturm
I found the cause of the problem, and a solution to make it work as it should.
As I already suspected after posting my question and reading the answer of Calum and issues with encoding on the Mac with another project (which was in Java), the cause of the problem is the default encoding used by Mac OS X. When you start scalainterpreter, it will use the default encoding for the specified platform. On Mac OS X, this is Macroman, on Windows it is probably CP1252. You can check this by typing the following command in the scala interpreter:
我找到了问题的原因,并找到了使其正常工作的解决方案。正如我在发布我的问题并阅读 Calum 的答案以及使用另一个项目(在 Java 中)在 Mac 上编码的问题后已经怀疑,问题的原因是 Mac OS X 使用的默认编码。当你启动scala解释器时,它将使用指定平台的默认编码。在 Mac OS X 上,这是 Macroman,在 Windows 上可能是 CP1252。您可以通过在 scala 解释器中键入以下命令来检查这一点:
scala> System.getProperty("file.encoding");
res3: java.lang.String = MacRoman
According to the scalahelp test, it is possible to provide Java properties using the -D option. However, this does not work for me. I ended up setting the environment variable
根据scala帮助测试,可以使用 -D 选项提供 Java 属性。但是,这对我不起作用。我最终设置了环境变量
JAVA_OPTS="-Dfile.encoding=UTF-8"
After running scala, the result of the previous command will give the following result:
运行后,上一条scala命令的结果将给出以下结果:
scala> System.getProperty("file.encoding")
res0: java.lang.String = UTF-8
Now, printing special characters works as expected:
现在,打印特殊字符按预期工作:
print(0x20AC.toChar)
So, it is not a bug in Scala, but an issue with default encodings. In my opinion, it would be better if by default UTF-8 was used on all platforms. In my search for an answer if this is considered, I came across a discussionon the Scala mailing list on this issue. In the first message, it is proposes to use UTF-8 by default on Mac OS X when file.encodingreports Macroman, since UTF-8 is the default charset on Mac OS X (keeps me wondering why file.encodingby defaults is set to Macroman, probably this is an inheritance from Mac OS before 10 was released?). I don't think this proposal will be part of Scala 2.8, since Martin Odersky wrotethat it is probably best to keep things as they are in Java (i.e. honor the file.encodingproperty).
因此,这不是 Scala 中的错误,而是默认编码的问题。在我看来,如果默认情况下在所有平台上都使用 UTF-8 会更好。在我的搜索,如果这被认为是一个答案,我碰到一个来讨论,在这个问题上的Scala邮件列表上。在第一条消息中,建议在 Mac OS X 上在file.encoding报告 Macroman时默认使用 UTF-8 ,因为 UTF-8 是 Mac OS X 上的默认字符集(让我想知道为什么file.encoding默认设置为 Macroman,这可能是10 之前从 Mac OS 的继承?)。我不认为这个提议将成为 Scala 2.8 的一部分,因为 Martin Odersky写道,最好保持 Java 中的内容(即尊重file.encoding财产)。
回答by Calum
Ok, at least part, if not all, of your problem here is that 128 is not the Unicode codepoint for Euro. 128 (or 0x80 since hex seems to be the norm) is U+0080 <control>, i.e. it is not a printable character, so it's not surprising your terminal is having trouble printing it.
好的,至少部分(如果不是全部)您的问题是 128 不是欧元的 Unicode 代码点。128(或 0x80,因为十六进制似乎是常态)是U+0080 <control>,即它不是可打印的字符,因此您的终端在打印它时遇到问题也就不足为奇了。
Euro's codepoint is 0x20AC (or in decimal 8364), and that appears to work for me (I'm on Linux, on a nightly of 2.8):
Euro 的代码点是 0x20AC(或十进制 8364),这似乎对我有用(我在 Linux 上,每晚 2.8):
scala> print(0x20AC.toChar)
Another fun test is to print the Unicode snowman character:
另一个有趣的测试是打印 Unicode 雪人字符:
scala> print(0x2603.toChar)
?
128 as is apparently an extended character from one of the Windows code pages.
128 显然是来自 Windows 代码页之一的扩展字符。
I got the other character you mentioned to work too:
我让你提到的另一个角色也起作用了:
scala> '?'.toInt
res8: Int = 402
scala> 402.toChar
res9: Char = ?
回答by Dedkov Vadim
For Windows in command line (cmd) print:
对于命令行 (cmd) 中的 Windows 打印:
set JAVA_OPTS="-Dfile.encoding=UTF-8"chcp 65001
set JAVA_OPTS="-Dfile.encoding=UTF-8"chcp 65001
Item 2 means UTF-8
第 2 项表示 UTF-8
If you don't want everytime print "chcp 65001", you can change/add value in Windows Registry like this:
如果您不想每次都打印“chcp 65001”,您可以像这样在 Windows 注册表中更改/添加值:
- Run command
regedit - find record [HKEY_CURRENT_USER\Software\Microsoft\Command Processor]
- New => String value
- Name = "AutoRun", Data = "chcp 65001" (without quotes)
- 运行命令
regedit - 查找记录 [HKEY_CURRENT_USER\Software\Microsoft\Command Processor]
- 新 => 字符串值
- Name = "AutoRun", Data = "chcp 65001" (不带引号)
(see https://superuser.com/a/482117/454417)
(见https://superuser.com/a/482117/454417)
I use Windows 10 and scala 2.11.8
我使用 Windows 10 和 Scala 2.11.8

