Java 如何确保字符串在 UTF-8 中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23932070/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to ensure that Strings are in UTF-8?
提问by YoBre
How to convert this String the survey?'s rules
to UTF-8
in Scala?
如何在 Scala 中将此字符串转换the survey?'s rules
为UTF-8
?
I tried these roads but does not work:
我尝试了这些道路,但不起作用:
scala> val text = "the survey?'s rules"
text: String = the survey?'s rules
scala> scala.io.Source.fromBytes(text.getBytes(), "UTF-8").mkString
res17: String = the survey?'s rules
scala> new String(text.getBytes(),"UTF8")
res21: String = the survey?'s rules
Ok, i'm resolved in this way. Not a converting but a simple reading
好的,我是这样解决的。不是转换而是简单的阅读
implicit val codec = Codec("US-ASCII").onMalformedInput(CodingErrorAction.IGNORE).onUnmappableCharacter(CodingErrorAction.IGNORE)
val src = Source.fromFile(new File (folderDestination + name + ".csv"))
val src2 = Source.fromFile(new File (folderDestination + name + ".csv"))
val reader = CSVReader.open(src.reader())
回答by Nitul
Just set the JVM's file.encoding
parameter to UTF-8
as follows:
只需将JVM的file.encoding
参数设置UTF-8
为如下:
-Dfile.encoding=UTF-8
It makes sure that UTF-8
is the default encoding.
它确保这UTF-8
是默认编码。
Using scala
it could be scala -Dfile.encoding=UTF-8
.
使用scala
它可能是scala -Dfile.encoding=UTF-8
。
回答by Vladimir Matveev
Note that when you call text.getBytes()
without arguments, you're in fact getting an array of bytes representing the string in your platform's default encoding. On Windows, for example, it could be some single-byte encoding; on Linux it can be UTF-8 already.
请注意,当您在text.getBytes()
不带参数的情况下调用时,您实际上获得了一个字节数组,该数组表示平台默认编码中的字符串。例如,在 Windows 上,它可能是某种单字节编码;在 Linux 上,它已经可以是 UTF-8。
To be correct you need to specify exact encoding in getBytes()
method call. For Java 7 and later do this:
为了正确,您需要在getBytes()
方法调用中指定精确的编码。对于 Java 7 及更高版本,请执行以下操作:
import java.nio.charset.StandardCharsets
val bytes = text.getBytes(StandardCharsets.UTF_8)
For Java 6 do this:
对于 Java 6,请执行以下操作:
import java.nio.charset.Charset
val bytes = text.getBytes(Charset.forName("UTF-8"))
Then bytes
will contain UTF-8-encoded text.
然后bytes
将包含 UTF-8 编码的文本。