java java中的UTF-8编码,从网站检索数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2005205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-8 Encoding in java, retrieving data from website
提问by Martin
I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.
我正在尝试从以 UTF-8 编码的网站获取数据并将它们插入到数据库 (MYSQL) 中。数据库也采用 UTF-8 编码。
This is the method I use to download data from specific site.
这是我用来从特定站点下载数据的方法。
public String download(String url) throws java.io.IOException {
java.io.InputStream s = null;
java.io.InputStreamReader r = null;
StringBuilder content = new StringBuilder();
try {
s = (java.io.InputStream)new URL(url).getContent();
r = new java.io.InputStreamReader(s, "UTF-8");
char[] buffer = new char[4*1024];
int n = 0;
while (n >= 0) {
n = r.read(buffer, 0, buffer.length);
if (n > 0) {
content.append(buffer, 0, n);
}
}
}
finally {
if (r != null) r.close();
if (s != null) s.close();
}
return content.toString();
}
If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C?te d'Ivtheitroade, instead of C?te d'Ivtheitroade.
如果编码设置为 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) 插入数据库的数据看起来不错,但是当我尝试显示它时,我得到像这样:C?te d'Ivtheitroade,而不是 C?te d'Ivtheitroade。
All my websites are encoded in UTF-8.
我所有的网站都以 UTF-8 编码。
Please help.
请帮忙。
If encoding is set to 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) everything works fine and I am getting C?te d'Ivtheitroade on my website (), but in java this title looks like 'C?′te d'Ivtheitroade' what breaks other things, such as for example links. What does it mean ?
如果编码设置为 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) 一切正常,我在我的网站上得到 C?te d'Ivtheitroade (),但是在 Java 中,这个标题看起来像 'C?'te d'Ivtheitroade' 什么打破了其他东西,例如链接。这是什么意思 ?
回答by Tomas
I would consider using commons-io, they have a function doing what you want to do:link
我会考虑使用 commons-io,他们有一个功能可以做你想做的事:link
That is replace your code with this:
那就是用这个替换你的代码:
public String download(String url) throws java.io.IOException {
java.io.InputStream s = null;
String content = null;
try {
s = (java.io.InputStream)new URL(url).getContent();
content = IOUtils.toString(s, "UTF-8")
}
finally {
if (s != null) s.close();
}
return content.toString();
}
if that nots doing start looking into if you can store it to file correctly to eliminate the possibility that your db isn't set up correctly.
如果不是这样,请开始研究您是否可以将其正确存储到文件中,以消除您的数据库设置不正确的可能性。
回答by glmxndr
Java
爪哇
The problem seems to lie in the HttpServletResponse, if you have a servlet or jsp page. Make sure to set your HttpServletResponseencoding to UTF-8.
问题似乎出在HttpServletResponse, 如果您有一个 servlet 或 jsp 页面。确保将HttpServletResponse编码设置为 UTF-8。
In a jsp page or in the doGetor doPostof a servlet, before any content is sent to the response, just do :
在JSP页面或doGet或doPost一个servlet,之前的任何内容发送到响应,只是做:
response.setCharacterEncoding("UTF-8");
PHP
PHP
In PHP, try to use the utf8-encodefunction after retrieving from the database.
在 PHP 中,从数据库中检索后尝试使用utf8-encode函数。
回答by Confusion
Is your database encoding set to UTF-8 for both server, client, connection and have the tables been created with that encoding? Check 'show variables' and 'show create table <one-of-the-tables>'
您的数据库编码是否为服务器、客户端、连接都设置为 UTF-8,并且是否使用该编码创建了表?检查“显示变量”和“显示创建表<one-of-the-tables>”
回答by BalusC
If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C?te d'Ivtheitroade, instead of C?te d'Ivtheitroade.
如果编码设置为 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) 插入数据库的数据看起来不错,但是当我尝试显示它时,我得到像这样:C?te d'Ivtheitroade,而不是 C?te d'Ivtheitroade。
Thus, the encoding duringthe display is wrong. How are you displaying it? As per the comments, it's a PHP page? If so, then you need to take two things into account:
因此,显示期间的编码是错误的。你是怎么显示的?根据评论,它是一个 PHP 页面?如果是这样,那么您需要考虑两件事:
- Write them to HTTP response output using the same encoding, thus
UTF-8. - Set content type to
UTF-8so that the webbrowser knows which encoding to use to display text.
- 使用相同的编码将它们写入 HTTP 响应输出,因此
UTF-8. - 将内容类型设置为,
UTF-8以便浏览器知道使用哪种编码来显示文本。
As per the comments, you have apparently already done 2. Left behind 1, in PHP you need to install mb_stringand set mbstring.http_outputto UTF-8as well. I have found this cheatsheetvery useful.
按照该意见,你显然已经做了2留守1,在PHP中您需要安装mb_string和设置mbstring.http_output到UTF-8为好。我发现这个备忘单非常有用。

