Java 如何确保字符串在 UTF-8 中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23932070/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 09:10:15  来源:igfitidea点击:

How to ensure that Strings are in UTF-8?

javascalautf-8character-encoding

提问by YoBre

How to convert this String the survey?'s rulesto UTF-8in Scala?

如何在 Scala 中将此字符串转换the survey?'s rulesUTF-8

I tried these roads but does not work:

我尝试了这些道路,但不起作用:

scala> val text = "the survey?'s rules"
text: String = the survey?'s rules

scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the survey?'s rules

scala> new String(text.getBytes(),"UTF8")
res21: String = the survey?'s rules

Ok, i'm resolved in this way. Not a converting but a simple reading

好的,我是这样解决的。不是转换而是简单的阅读

implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)

val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))

val reader = CSVReader.open(src.reader())

回答by Nitul

Just set the JVM's file.encodingparameter to UTF-8as follows:

只需将JVM的file.encoding参数设置UTF-8为如下:

-Dfile.encoding=UTF-8

It makes sure that UTF-8is the default encoding.

它确保这UTF-8是默认编码。

Using scalait could be scala -Dfile.encoding=UTF-8.

使用scala它可能是scala -Dfile.encoding=UTF-8

回答by Vladimir Matveev

Note that when you call text.getBytes()without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding. On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.

请注意,当您在text.getBytes()不带参数的情况下调用时,您实际上获得了一个字节数组,该数组表示平台默认编码中的字符串。例如,在 Windows 上,它可能是某种单字节编码;在 Linux 上,它已经可以是 UTF-8。

To be correct you need to specify exact encoding in getBytes()method call. For Java 7 and later do this:

为了正确,您需要在getBytes()方法调用中指定精确的编码。对于 Java 7 及更高版本,请执行以下操作:

import java.nio.charset.StandardCharsets

val bytes = text.getBytes(StandardCharsets.UTF_8)

For Java 6 do this:

对于 Java 6,请执行以下操作:

import java.nio.charset.Charset

val bytes = text.getBytes(Charset.forName("UTF-8"))

Then byteswill contain UTF-8-encoded text.

然后bytes将包含 UTF-8 编码的文本。