如何让 UTF-8 在 Java webapps 中工作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/138948/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get UTF-8 working in Java webapps?
提问by kosoant
I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support ???
etc. for regular Finnish text and Cyrillic alphabets like ЦжФ
for special cases.
我需要让 UTF-8 在我的 Java webapp(servlets + JSP,不使用框架)中工作,以支持???
常规芬兰语文本和西里尔字母等ЦжФ
特殊情况。
My setup is the following:
我的设置如下:
- Development environment: Windows XP
- Production environment: Debian
- 开发环境:Windows XP
- 生产环境:Debian
Database used: MySQL 5.x
使用的数据库:MySQL 5.x
Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site.
用户主要使用 Firefox2,但也使用 Opera 9.x、FF3、IE7 和 Google Chrome 访问该站点。
How to achieve this?
如何实现这一目标?
采纳答案by kosoant
Answering myself as the FAQ of this site encourages it. This works for me:
回答我自己作为本网站的常见问题解答鼓励它。这对我有用:
Mostly characters ??? are not a problematic as the default character set used by browsers and tomcat/java for webapps is latin1 ie. ISO-8859-1 which "understands" those characters.
主要是字符???没有问题,因为浏览器和 tomcat/java 用于 webapps 的默认字符集是 latin1,即。“理解”这些字符的 ISO-8859-1。
To get UTF-8 working under Java+Tomcat+Linux/Windows+Mysql requires the following:
要让 UTF-8 在 Java+Tomcat+Linux/Windows+Mysql 下工作,需要以下条件:
Configuring Tomcat's server.xml
配置Tomcat的server.xml
It's necessary to configure that the connector uses UTF-8 to encode url (GET request) parameters:
需要配置连接器使用UTF-8编码url(GET请求)参数:
<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
compression="on"
compressionMinSize="128"
noCompressionUserAgents="gozilla, traviata"
compressableMimeType="text/html,text/xml,text/plain,text/css,text/ javascript,application/x-javascript,application/javascript"
URIEncoding="UTF-8"
/>
The key part being URIEncoding="UTF-8"in the above example. This quarantees that Tomcat handles all incoming GET parameters as UTF-8 encoded. As a result, when the user writes the following to the address bar of the browser:
上面例子中的关键部分是URIEncoding="UTF-8"。这保证 Tomcat 将所有传入的 GET 参数处理为 UTF-8 编码。结果,当用户将以下内容写入浏览器的地址栏时:
https://localhost:8443/ID/Users?action=search&name=*ж*
the character ж is handled as UTF-8 and is encoded to (usually by the browser before even getting to the server) as %D0%B6.
字符 ж 被处理为 UTF-8 并被编码为(通常由浏览器在到达服务器之前)为%D0%B6。
POST request are not affected by this.
POST 请求不受此影响。
CharsetFilter
字符集过滤器
Then it's time to force the java webapp to handle all requests and responses as UTF-8 encoded. This requires that we define a character set filter like the following:
然后是时候强制 java webapp 将所有请求和响应处理为 UTF-8 编码了。这要求我们定义一个字符集过滤器,如下所示:
package fi.foo.filters;
import javax.servlet.*;
import java.io.IOException;
public class CharsetFilter implements Filter {
private String encoding;
public void init(FilterConfig config) throws ServletException {
encoding = config.getInitParameter("requestEncoding");
if (encoding == null) encoding = "UTF-8";
}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException {
// Respect the client-specified character encoding
// (see HTTP specification section 3.4.1)
if (null == request.getCharacterEncoding()) {
request.setCharacterEncoding(encoding);
}
// Set the default response content type and encoding
response.setContentType("text/html; charset=UTF-8");
response.setCharacterEncoding("UTF-8");
next.doFilter(request, response);
}
public void destroy() {
}
}
This filter makes sure that if the browser hasn't set the encoding used in the request, that it's set to UTF-8.
此过滤器确保如果浏览器未设置请求中使用的编码,则将其设置为 UTF-8。
The other thing done by this filter is to set the default response encoding ie. the encoding in which the returned html/whatever is. The alternative is to set the response encoding etc. in each controller of the application.
此过滤器完成的另一件事是设置默认响应编码,即。返回的 html/任何内容的编码。另一种方法是在应用程序的每个控制器中设置响应编码等。
This filter has to be added to the web.xmlor the deployment descriptor of the webapp:
必须将此过滤器添加到web.xml或 webapp 的部署描述符中:
<!--CharsetFilter start-->
<filter>
<filter-name>CharsetFilter</filter-name>
<filter-class>fi.foo.filters.CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
The instructions for making this filter are found at the tomcat wiki (http://wiki.apache.org/tomcat/Tomcat/UTF-8)
在tomcat wiki ( http://wiki.apache.org/tomcat/Tomcat/UTF-8) 中可以找到制作此过滤器的说明
JSP page encoding
JSP页面编码
In your web.xml, add the following:
在您的web.xml 中,添加以下内容:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
Alternatively, all JSP-pages of the webapp would need to have the following at the top of them:
或者,webapp 的所有 JSP 页面都需要在它们的顶部具有以下内容:
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
If some kind of a layout with different JSP-fragments is used, then this is needed in allof them.
如果使用了具有不同 JSP 片段的某种布局,那么所有这些都需要这种布局。
HTML-meta tags
HTML-meta 标签
JSP page encoding tells the JVM to handle the characters in the JSP page in the correct encoding. Then it's time to tell the browser in which encoding the html page is:
JSP 页面编码告诉 JVM 以正确的编码处理 JSP 页面中的字符。然后是时候告诉浏览器 html 页面的编码方式了:
This is done with the following at the top of each xhtml page produced by the webapp:
这是通过 webapp 生成的每个 xhtml 页面顶部的以下内容完成的:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
...
JDBC-connection
JDBC-连接
When using a db, it has to be defined that the connection uses UTF-8 encoding. This is done in context.xmlor wherever the JDBC connection is defiend as follows:
使用 db 时,必须定义连接使用 UTF-8 编码。这是在context.xml或 JDBC 连接定义的任何地方完成的,如下所示:
<Resource name="jdbc/AppDB"
auth="Container"
type="javax.sql.DataSource"
maxActive="20" maxIdle="10" maxWait="10000"
username="foo"
password="bar"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/ ID_development?useEncoding=true&characterEncoding=UTF-8"
/>
MySQL database and tables
MySQL 数据库和表
The used database must use UTF-8 encoding. This is achieved by creating the database with the following:
使用的数据库必须使用 UTF-8 编码。这是通过使用以下内容创建数据库来实现的:
CREATE DATABASE `ID_development`
/*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_swedish_ci */;
Then, all of the tables need to be in UTF-8 also:
然后,所有表也需要使用 UTF-8:
CREATE TABLE `Users` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(30) collate utf8_swedish_ci default NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci ROW_FORMAT=DYNAMIC;
The key part being CHARSET=utf8.
关键部分是CHARSET=utf8。
MySQL server configuration
MySQL服务器配置
MySQL serveri has to be configured also. Typically this is done in Windows by modifying my.ini-file and in Linux by configuring my.cnf-file. In those files it should be defined that all clients connected to the server use utf8 as the default character set and that the default charset used by the server is also utf8.
还必须配置 MySQL serveri。通常,这是通过修改my.ini-file在 Windows 中完成,在 Linux 中通过配置my.cnf-file 完成。在这些文件中,应该定义连接到服务器的所有客户端都使用 utf8 作为默认字符集,并且服务器使用的默认字符集也是 utf8。
[client]
port=3306
default-character-set=utf8
[mysql]
default-character-set=utf8
Mysql procedures and functions
Mysql程序和函数
These also need to have the character set defined. For example:
这些还需要定义字符集。例如:
DELIMITER $$
DROP FUNCTION IF EXISTS `pathToNode` $$
CREATE FUNCTION `pathToNode` (ryhma_id INT) RETURNS TEXT CHARACTER SET utf8
READS SQL DATA
BEGIN
DECLARE path VARCHAR(255) CHARACTER SET utf8;
SET path = NULL;
...
RETURN path;
END $$
DELIMITER ;
GET requests: latin1 and UTF-8
GET 请求:latin1 和 UTF-8
If and when it's defined in tomcat's server.xml that GET request parameters are encoded in UTF-8, the following GET requests are handled properly:
如果并且当在 tomcat 的 server.xml 中定义 GET 请求参数以 UTF-8 编码时,以下 GET 请求将被正确处理:
https://localhost:8443/ID/Users?action=search&name=Petteri
https://localhost:8443/ID/Users?action=search&name=ж
Because ASCII-characters are encoded in the same way both with latin1 and UTF-8, the string "Petteri" is handled correctly.
由于 ASCII 字符的编码方式与 latin1 和 UTF-8 相同,因此可以正确处理字符串“Petteri”。
The Cyrillic character ж is not understood at all in latin1. Because Tomcat is instructed to handle request parameters as UTF-8 it encodes that character correctly as %D0%B6.
在 latin1 中根本无法理解西里尔字母 ж。因为 Tomcat 被指示以 UTF-8 处理请求参数,所以它将该字符正确编码为%D0%B6。
If and when browsers are instructed to read the pages in UTF-8 encoding (with request headers and html meta-tag), at least Firefox 2/3 and other browsers from this period all encode the character themselves as %D0%B6.
如果当浏览器被指示以 UTF-8 编码(带有请求头和 html 元标记)读取页面时,至少 Firefox 2/3 和此时期的其他浏览器都将字符本身编码为%D0%B6。
The end result is that all users with name "Petteri" are found and also all users with the name "ж" are found.
最终结果是找到了所有名为“Petteri”的用户,并找到了所有名为“ж”的用户。
But what about ????
但是关于 ????
HTTP-specification defines that by default URLs are encoded as latin1. This results in firefox2, firefox3 etc. encoding the following
HTTP 规范定义默认情况下 URL 编码为 latin1。这导致 firefox2、firefox3 等编码以下内容
https://localhost:8443/ID/Users?action=search&name=*P?ivi*
in to the encoded version
进入编码版本
https://localhost:8443/ID/Users?action=search&name=*P%E4ivi*
In latin1 the character ?is encoded as %E4. Even though the page/request/everything is defined to use UTF-8. The UTF-8 encoded version of ? is %C3%A4
在 latin1 中的字符? 被编码为%E4。即使页面/请求/所有内容都定义为使用 UTF-8。的 UTF-8 编码版本?是%C3%A4
The result of this is that it's quite impossible for the webapp to correly handle the request parameters from GET requests as some characters are encoded in latin1 and others in UTF-8. Notice: POST requests do work as browsers encode all request parameters from forms completely in UTF-8 if the page is defined as being UTF-8
这样做的结果是 web 应用程序完全不可能正确处理来自 GET 请求的请求参数,因为某些字符以 latin1 编码,而其他字符以 UTF-8 编码。 注意:如果页面被定义为 UTF-8,那么 POST 请求确实可以工作,因为浏览器会将来自表单的所有请求参数完全以 UTF-8 编码
Stuff to read
阅读的东西
A very big thank you for the writers of the following for giving the answers for my problem:
非常感谢以下作者为我的问题提供答案:
- http://tagunov.tripod.com/i18n/i18n.html
- http://wiki.apache.org/tomcat/Tomcat/UTF-8
- http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
- http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
- http://jeppesn.dk/utf-8.html
- http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
- http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
- http://www.utf8-chartable.de/
- http://tagunov.tripod.com/i18n/i18n.html
- http://wiki.apache.org/tomcat/Tomcat/UTF-8
- http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
- http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
- http://jeppesn.dk/utf-8.html
- http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
- http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
- http://www.utf8-chartable.de/
Important Note
重要的提示
mysqlsupports the Basic Multilingual Planeusing 3-byte UTF-8 characters. If you need to go outside of that (certain alphabets require more than 3-bytes of UTF-8), then you either need to use a flavor of VARBINARY
column type or use the utf8mb4
character set(which requires MySQL 5.5.3 or later). Just be aware that using the utf8
character set in MySQL won't work 100% of the time.
mysql支持使用 3 字节 UTF-8 字符的基本多语言平面。如果您需要超出此范围(某些字母表需要超过 3 个字节的 UTF-8),那么您需要使用VARBINARY
列类型的风格或使用utf8mb4
字符集(这需要 MySQL 5.5.3 或更高版本)。请注意,utf8
在 MySQL中使用字符集不会 100% 工作。
Tomcat with Apache
Tomcat 与 Apache
One more thing If you are using Apache + Tomcat + mod_JK connector then you also need to do following changes:
另一件事如果您使用的是 Apache + Tomcat + mod_JK 连接器,那么您还需要进行以下更改:
- Add URIEncoding="UTF-8" into tomcat server.xml file for 8009 connector, it is used by mod_JK connector.
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8"/>
- Goto your apache folder i.e.
/etc/httpd/conf
and addAddDefaultCharset utf-8
inhttpd.conf file
. Note:First check that it is exist or not. If exist you may update it with this line. You can add this line at bottom also.
- 将URIEncoding="UTF-8" 添加到8009 连接器的tomcat server.xml 文件中,供mod_JK 连接器使用。
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8"/>
- 转到你的apache文件夹即
/etc/httpd/conf
添加AddDefaultCharset utf-8
在httpd.conf file
。注意:首先检查它是否存在。如果存在,您可以使用此行更新它。您也可以在底部添加此行。
回答by stian
I think you summed it up quite well in your own answer.
我认为您在自己的答案中总结得很好。
In the process of UTF-8-ing(?) from end to end you might also want to make sure java itself is using UTF-8. Use -Dfile.encoding=utf-8 as parameter to the JVM (can be configured in catalina.bat).
在 UTF-8-ing(?) 从头到尾的过程中,您可能还想确保 java 本身使用的是 UTF-8。使用 -Dfile.encoding=utf-8 作为 JVM 的参数(可以在 catalina.bat 中配置)。
回答by Mike Mountrakis
This is for Greek Encoding in MySql tables when we want to access them using Java:
当我们想使用 Java 访问它们时,这是用于 MySql 表中的希腊语编码:
Use the following connection setup in your JBoss connection pool (mysql-ds.xml)
在 JBoss 连接池 (mysql-ds.xml) 中使用以下连接设置
<connection-url>jdbc:mysql://192.168.10.123:3308/mydatabase</connection-url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<user-name>nts</user-name>
<password>xaxaxa!</password>
<connection-property name="useUnicode">true</connection-property>
<connection-property name="characterEncoding">greek</connection-property>
If you don't want to put this in a JNDI connection pool, you can configure it as a JDBC-url like the next line illustrates:
如果您不想将它放在 JNDI 连接池中,您可以将其配置为 JDBC-url,如下行所示:
jdbc:mysql://192.168.10.123:3308/mydatabase?characterEncoding=greek
For me and Nick, so we never forget it and waste time anymore.....
对于我和尼克,所以我们永远不会忘记它并不再浪费时间......
回答by Mike Mountrakis
In case you have specified in connection pool (mysql-ds.xml), in your Java code you can open the connection as follows:
如果您已在连接池 (mysql-ds.xml) 中指定,则在您的 Java 代码中,您可以按如下方式打开连接:
DriverManager.registerDriver(new com.mysql.jdbc.Driver());
Connection conn = DriverManager.getConnection(
"jdbc:mysql://192.168.1.12:3308/mydb?characterEncoding=greek",
"Myuser", "mypass");
回答by Jay
Nice detailed answer. just wanted to add one more thing which will definitely help others to see the UTF-8 encoding on URLs in action .
好详细的回答。只是想再添加一件事,这肯定会帮助其他人看到 URL 上的 UTF-8 编码在起作用。
Follow the steps below to enable UTF-8 encoding on URLs in firefox.
按照以下步骤在 firefox 中对 URL 启用 UTF-8 编码。
type "about:config" in the address bar.
Use the filter input type to search for "network.standard-url.encode-query-utf8" property.
- the above property will be false by default, turn that to TRUE.
- restart the browser.
在地址栏中输入“about:config”。
使用过滤器输入类型搜索“network.standard-url.encode-query-utf8”属性。
- 默认情况下,上述属性将为 false,将其变为 TRUE。
- 重新启动浏览器。
UTF-8 encoding on URLs works by default in IE6/7/8 and chrome.
URL 上的 UTF-8 编码默认适用于 IE6/7/8 和 chrome。
回答by John
回答by caarlos0
I'm with a similar problem, but, in filenames of a file I'm compressing with apache commons. So, i resolved it with this command:
我遇到了类似的问题,但是,在我使用 apache commons 压缩的文件的文件名中。所以,我用这个命令解决了它:
convmv --notest -f cp1252 -t utf8 * -r
it works very well for me. Hope it help anyone ;)
它对我很有效。希望它可以帮助任何人;)
回答by bnguyen82
For my case of displaying Unicode character from message bundles, I don't need to apply "JSP page encoding" section to display Unicode on my jsp page. All I need is "CharsetFilter" section.
对于我从消息包中显示 Unicode 字符的情况,我不需要应用“JSP 页面编码”部分来在我的 jsp 页面上显示 Unicode。我只需要“CharsetFilter”部分。
回答by Raedwald
To add to kosoant's answer, if you are using Spring, rather than writing your own Servlet filter, you can use the class org.springframework.web.filter.CharacterEncodingFilter
they provide, configuring it like the following in your web.xml:
要添加到kosoant 的答案中,如果您使用的是 Spring,而不是编写自己的 Servlet 过滤器,您可以使用org.springframework.web.filter.CharacterEncodingFilter
它们提供的类,在您的 web.xml 中按如下方式配置它:
<filter>
<filter-name>encoding-filter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>FALSE</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encoding-filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
回答by David
One other point that hasn't been mentioned relates to Java Servlets working with Ajax. I have situations where a web page is picking up utf-8 text from the user sending this to a JavaScript file which includes it in a URI sent to the Servlet. The Servlet queries a database, captures the result and returns it as XML to the JavaScript file which formats it and inserts the formatted response into the original web page.
尚未提及的另一点与使用 Ajax 的 Java Servlet 有关。我遇到过网页从用户那里获取 utf-8 文本并将其发送到 JavaScript 文件的情况,该文件将其包含在发送到 Servlet 的 URI 中。Servlet 查询数据库,捕获结果并将其作为 XML 返回到 JavaScript 文件,该文件对其进行格式化并将格式化的响应插入到原始网页中。
In one web app I was following an early Ajax book's instructions for wrapping up the JavaScript in constructing the URI. The example in the book used the escape() method, which I discovered (the hard way) is wrong. For utf-8 you must use encodeURIComponent().
在一个 Web 应用程序中,我按照早期 Ajax 书籍的说明将 JavaScript 封装到构造 URI 中。书中的示例使用了 escape() 方法,我发现(困难的方法)是错误的。对于 utf-8,您必须使用 encodeURIComponent()。
Few people seem to roll their own Ajax these days, but I thought I might as well add this.
这些天似乎很少有人推出自己的 Ajax,但我想我不妨添加这个。