scala Scala在两个字符集之间转换字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39643366/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scala convert string between two charsets
提问by Michel Hua
I have a misformed UTF-8 string consisting that should be written "Michèle Huà" but outputs as "Mich?¨le Hu?"
我有一个格式错误的 UTF-8 字符串,应该写成“Michèle Huà”,但输出为“Mich?¨le Hu?”
According to this table it is a problem between Windows-1252 and UTF-8 http://www.i18nqa.com/debug/utf8-debug.html
根据此表,这是 Windows-1252 和 UTF-8 http://www.i18nqa.com/debug/utf8-debug.html之间的问题
How do I make conversion?
如何进行转换?
scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "ISO-8859-1").mkString
res25: String = Mich??¨le Hu?
scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "UTF-8").mkString
res26: String = Mich?¨le Hu?
scala> scala.io.Source.fromBytes("Mich?¨le Hu?".getBytes(), "Windows-1252").mkString
res27: String = Mich???¨le Hu??
Thank you
谢谢
回答by Rex Kerr
You don't actually have the complete string there, due to an unfortunate issue with one character printing blank. "Michèle Huà" when encoded as UTF-8 but read as Windows-1252 is actually "Mich?¨le Hu??", where that last character is 0xA0 (but typically pastes as 0x20, a space).
由于一个字符打印空白的不幸问题,您实际上没有完整的字符串。“Michèle Huà”在编码为 UTF-8 但读取为 Windows-1252 时实际上是“Mich?¨le Hu??”,其中最后一个字符是 0xA0(但通常粘贴为 0x20,一个空格)。
If you can include that character, you can convert successfully.
如果可以包含该字符,则可以成功转换。
scala> fixed = new String("Mich?¨le Hu?\u00A0".getBytes("Windows-1252"), "UTF-8")
fixed: String = Michèle Huà

