java 无法让 Servlet 将请求内容处理为 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1125758/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cannot get Servlet to process request content as UTF-8
提问by Rob Hruska
I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8.
我正在将旧应用程序从 ISO-8859-1 转换为 UTF-8,并且我使用了许多资源来确定我需要设置什么才能使其工作。但是,经过多次配置、代码和环境更改后,我的 Servlet(在 Tomcat 5 中)似乎无法将提交的 HTML 表单内容处理为 UTF-8。
Here's what I've set up for configuration.
这是我为配置设置的内容。
- System properties
- 系统属性
[user@server ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
[user@server ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
- tomcat5 server.xml
- tomcat5 server.xml
<Connector protocol="HTTP/1.1" ... URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
<Connector protocol="HTTP/1.1" ... URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
- JSP file
- JSP文件
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %> ... <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %> ... <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
- Servlet filter
- 小程序过滤器
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) { if(request.getCharacterEncoding() == null) { request.setCharacterEncoding("UTF-8"); } ...
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) { if(request.getCharacterEncoding() == null) { request.setCharacterEncoding("UTF-8"); } ...
With some debug logs I know the following:
通过一些调试日志,我知道以下内容:
System.getProperty("file.encoding"): "UTF-8" java.nio.charset.Charset.defaultCharset(): "UTF-8" new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"
System.getProperty("file.encoding"): "UTF-8" java.nio.charset.Charset.defaultCharset(): "UTF-8" new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"
However, when I submit my form with an input containing "Бить баклуши", I see the following (from my logs):
但是,当我提交包含“Бить баклуши”的输入的表单时,我看到以下内容(从我的日志中):
request.getParameter("myParameter") = D1D??2?4 D±D°DoD??3?0D
request.getParameter("myParameter") = D1D??2?4 D±D°DoD??3?0D
I know that the request content type was null, so it was explicitly set to "UTF-8" in my servlet filter. Also, I'm viewing my logs from a terminal, whose encoding I know is set to UTF-8 as well.
我知道请求内容类型是null,因此它在我的 servlet 过滤器中明确设置为“UTF-8”。此外,我正在从终端查看我的日志,我知道其编码也设置为 UTF-8。
What am I missing here? What else do I need to set for the Servlet to correctly process my input as UTF-8?If more information will help, I'll be glad to add more debugging and update this question with it.
我在这里错过了什么?我还需要为 Servlet 设置什么才能将我的输入正确处理为 UTF-8?如果更多信息有帮助,我很乐意添加更多调试并用它更新这个问题。
Edit:
编辑:
- I'm not using Windows Terminal (I'm using PuTTY), so I'm pretty certain the problem is not what I'm viewing the logs with. This is seconded by the fact that when I send my response back to the browser with the submitted content and output it, it's the same garbage as above.
- The form's being submitted from IE8.
- 我没有使用 Windows 终端(我使用的是 PuTTY),所以我很确定问题不是我正在查看的日志。这是因为当我将响应与提交的内容一起发送回浏览器并输出时,它与上面的垃圾相同。
- 该表单是从 IE8 提交的。
Solution:
解决方案:
My web.xmldefinition for my CharsetFilter was too far down (below my servlet configurations and other filters). I moved the filter definition to the very top of the web.xml document and everything worked correctly. See the accepted answer below.
我web.xml对 CharsetFilter 的定义太低了(低于我的 servlet 配置和其他过滤器)。我将过滤器定义移到 web.xml 文档的最顶部,一切正常。请参阅下面接受的答案。
回答by akarnokd
Edit4(the final and corrected answer as requested)
Edit4(按要求提供最终和更正的答案)
Your servlet filter gets applied too late.
您的 servlet 过滤器应用得太晚了。
A possible proper order would be in web.xmlas follows
可能的正确顺序web.xml如下
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/j2ee/dtds/web-app_2.3.dtd">
<web-app>
<!--CharsetFilter start-->
<filter>
<filter-name>Charset Filter</filter-name>
<filter-class>CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<!-- The rest is ommited -->
回答by Amin
At first I thought the issue would get settled easily but it took me 2 days to figure it out. Here is my finding and I hope it helps 1) You need to have below code in your JSP
起初我以为这个问题会很容易解决,但我花了两天时间才弄明白。这是我的发现,希望对您有所帮助 1) 您的 JSP 中需要有以下代码
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
if you have many JPS pages then you can use below code in web.xml as explained here: How can I cleanly set the pageEncoding of all my JSPs?
如果您有许多 JPS 页面,那么您可以在 web.xml 中使用以下代码,如下所述:如何干净地设置所有 JSP 的 pageEncoding?
2) Be sure before you read any parameter in your servlet, you have already set character encoding to UTF-8
2) 确保在读取 servlet 中的任何参数之前,您已经将字符编码设置为 UTF-8
request.setCharacterEncoding("UTF-8");
I have done it in my own filter (first filter before chain.doFilter.
我已经在我自己的过滤器中完成了它(chain.doFilter 之前的第一个过滤器。
3) Your database must support UTF-8 so be sure you have already applied the changes to your table and columns. To be sure it works fine just type in some words in Japanese and save. If the table holds the content then that is fine.
3) 您的数据库必须支持 UTF-8,因此请确保您已将更改应用于表和列。为了确保它工作正常,只需用日语输入一些单词并保存。如果表格包含内容,那就没问题了。
4) The last and most important one is the connection string to your database. Even though all my DB and tables were supporting the UTF8 but this extra line was the reason I could save my content into the database. So be sure you add characterEncoding=UTF8 to your connection string like below
4) 最后也是最重要的一个是数据库的连接字符串。尽管我所有的数据库和表都支持 UTF8,但这一额外的行是我可以将我的内容保存到数据库中的原因。因此,请确保将 characterEncoding=UTF8 添加到连接字符串中,如下所示
jdbc:mysql://127.0.0.1:3306/my_daabase?characterEncoding=UTF8
For JSP pages with enctype="multipart/form-data" you will need to do one extra step. When you read a FileItem by getString method be sure you change it to getString("UTF-8") then that should do fine.
对于带有 enctype="multipart/form-data" 的 JSP 页面,您需要做一个额外的步骤。当您通过 getString 方法读取 FileItem 时,请确保将其更改为 getString("UTF-8") 那么应该可以。

