scala Scala在两个字符集之间转换字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39643366/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:40:08  来源:igfitidea点击:

Scala convert string between two charsets

scalautf-8character-encoding

提问by Michel Hua

I have a misformed UTF-8 string consisting that should be written "Michèle Huà" but outputs as "Mich?¨le Hu?"

我有一个格式错误的 UTF-8 字符串,应该写成“Michèle Huà”,但输出为“Mich?¨le Hu?”

According to this table it is a problem between Windows-1252 and UTF-8 http://www.i18nqa.com/debug/utf8-debug.html

根据此表,这是 Windows-1252 和 UTF-8 http://www.i18nqa.com/debug/utf8-debug.html之间的问题

How do I make conversion?

如何进行转换?

scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "ISO-8859-1").mkString
res25: String = Mich??¨le Hu?

scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "UTF-8").mkString
res26: String = Mich?¨le Hu?

scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "Windows-1252").mkString
res27: String = Mich???¨le Hu??

Thank you

谢谢

回答by Rex Kerr

You don't actually have the complete string there, due to an unfortunate issue with one character printing blank. "Michèle Huà" when encoded as UTF-8 but read as Windows-1252 is actually "Mich?¨le Hu??", where that last character is 0xA0 (but typically pastes as 0x20, a space).

由于一个字符打印空白的不幸问题,您实际上没有完整的字符串。“Michèle Huà”在编码为 UTF-8 但读取为 Windows-1252 时实际上是“Mich?¨le Hu??”,其中最后一个字符是 0xA0(但通常粘贴为 0x20,一个空格)。

If you can include that character, you can convert successfully.

如果可以包含该字符,则可以成功转换。

scala> fixed = new String("Mich?¨le Hu?\u00A0".getBytes("Windows-1252"), "UTF-8")
fixed: String = Michèle Huà