Java 如何在 Android 中将字符串转换为 UTF-8?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3161712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I convert a string to UTF-8 in Android?
提问by droidgren
I am using a HTML parser called Jsoup, to load and parse HTML files. The problem is that the webpage I'm scraping is encoded in ISO-8859-1
charset while Android is using UTF-8
encoding(?). This is results in some characters showing up as question marks.
我正在使用名为 Jsoup 的 HTML 解析器来加载和解析 HTML 文件。问题是我抓取的网页是用ISO-8859-1
字符集编码的,而 Android 正在使用UTF-8
编码(?)。这导致某些字符显示为问号。
So now I guess I should convert the string to UTF-8 format.
所以现在我想我应该将字符串转换为 UTF-8 格式。
Now I have found this Class called CharsetEncoderin the Android SDK, which I guess could help me. But I can't figure out how to implement it in practice, so I wonder if could get som help with by a practical example.
现在我在 Android SDK 中找到了这个名为CharsetEncoder 的类,我想它可以帮助我。但是我不知道如何在实践中实现它,所以我想知道是否可以通过一个实际的例子得到一些帮助。
UPDATE: Code to read data (Jsoup)
更新:读取数据的代码(Jsoup)
url = new URL("http://www.example.com");
Document doc = Jsoup.parse(url, 4000);
采纳答案by Al Sutton
You can let Android do the work for you by reading the page into a byte[] and then using the jSoup methods for parsing String objects.
您可以让 Android 为您完成这项工作,方法是将页面读入 byte[],然后使用 jSoup 方法解析 String 对象。
Don't forget to specify the encoding when you create the string from the data read from the server using the correct String constructor.
当您使用正确的 String构造函数从从服务器读取的数据创建字符串时,不要忘记指定编码。
回答by droidgren
public static void main(String[] args) {
System.out.println(System.getProperty("file.encoding"));
String original = new String("A" + "\u00ea" + "\u00f1"
+ "\u00fc" + "C");
System.out.println("original = " + original);
System.out.println();
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
} // main