Java 当表单作为 multipart/form-data 发布时,UTF-8 文本出现乱码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/546365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 16:04:33  来源:igfitidea点击:

UTF-8 text is garbled when form is posted as multipart/form-data

javajakarta-ee

提问by

I'm uploading a file to the server. The file upload HTML form has 2 fields:

我正在将文件上传到服务器。文件上传 HTML 表单有 2 个字段:

  1. File name - A HTML text box where the user can give a name in any language.
  2. File upload - A HTMl 'file' where user can specify a file from disk to upload.
  1. 文件名 - 一个 HTML 文本框,用户可以在其中指定任何语言的名称。
  2. 文件上传 - 一个 HTMl“文件”,用户可以在其中指定要从磁盘上传的文件。

When the form is submitted, the file contents are received properly. However, when the file name (point 1 above) is read, it is garbled. ASCII characters are displayed properly. When the name is given in some other language (German, French etc.), there are problems.

当表单提交时,文件内容被正确接收。但是,读取文件名(上面的第1点)时,却是乱码。正确显示 ASCII 字符。当名称以其他语言(德语、法语等)命名时,就会出现问题。

In the servlet method, the request's character encoding is set to UTF-8. I even tried doing a filter as mentioned - How can I make this code to submit a UTF-8 form textarea with jQuery/Ajax work?- but it doesn't seem to work. Only the filename seems to be garbled.

在servlet 方法中,请求的字符编码设置为UTF-8。我什至尝试做一个如上所述的过滤器 -如何使此代码使用 jQuery/Ajax 提交 UTF-8 表单 textarea 工作?- 但它似乎不起作用。只有文件名似乎是乱码。

The MySQL table where the file name goes supports UTF-8. I gave random non-English characters & they are stored/displayed properly.

文件名所在的 MySQL 表支持 UTF-8。我给出了随机的非英文字符,它们被正确存储/显示。

Using Fiddler, I monitored the request & all the POST data is passed correctly. I'm trying to identify how/where the data could get garbled. Any help will be greatly appreciated.

使用 Fiddler,我监视了请求并且所有 POST 数据都正确传递。我正在尝试确定数据如何/哪里会出现乱码。任何帮助将不胜感激。

采纳答案by Philip Helger

I had the same problem using Apache commons-fileupload. I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places: 1. HTML meta tag 2. Form accept-charset attribute 3. Tomcat filter on every request that sets the "UTF-8" encoding

我在使用 Apache commons-fileupload 时遇到了同样的问题。我没有找出导致问题的原因,特别是因为我在以下地方使用了 UTF-8 编码:1. HTML 元标记 2. 表单接受字符集属性 3. Tomcat 过滤器对每个设置“UTF-8”的请求编码

-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:

-> 我的解决方案是特别将字符串从 ISO-8859-1(或任何平台的默认编码)转换为 UTF-8:

new String (s.getBytes ("iso-8859-1"), "UTF-8");

hope that helps

希望有帮助

Edit: starting with Java 7 you can also use the following:

编辑:从 Java 7 开始,您还可以使用以下内容:

new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

回答by Michael Glenn

The filter is key for IE. A few other things to check;

过滤器是 IE 的关键。需要检查的其他一些事项;

What is the page encoding and character set? Both should be UTF-8

什么是页面编码和字符集?两者都应该是 UTF-8

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

What is the character set in the meta tag?

元标记中的字符集是什么?

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Does your MySQL connection string specify UTF-8? e.g.

您的 MySQL 连接字符串是否指定了 UTF-8?例如

jdbc:mysql://127.0.0.1/dbname?requireSSL=false&useUnicode=true&characterEncoding=UTF-8

回答by nbeyer

The filter thing and setting up Tomcat to support UTF-8 URIs is only important if you're passing the via the URL's query string, as you would with a HTTP GET. If you're using a POST, with a query string in the HTTP message's body, what's important is going to be the content-type of the request and this will be up to the browser to set the content-type to UTF-8 and send the content with that encoding.

过滤器和设置 Tomcat 以支持 UTF-8 URI 仅在您通过 URL 的查询字符串传递时才重要,就像使用 HTTP GET 一样。如果您使用 POST,在 HTTP 消息的正文中带有查询字符串,那么重要的是请求的内容类型,这取决于浏览器将内容类型设置为 UTF-8 和发送具有该编码的内容。

The only way to really do this is by telling the browser that you can only accept UTF-8 by setting the Accept-Charset header on every response to "UTF-8;q=1,ISO-8859-1;q=0.6". This will put UTF-8 as the best quality and the default charset, ISO-8859-1, as acceptable, but a lower quality.

真正做到这一点的唯一方法是通过将每个响应的 Accept-Charset 标头设置为“UTF-8;q=1,ISO-8859-1;q=0.6”来告诉浏览器您只能接受 UTF-8 . 这将使 UTF-8 成为最佳质量,默认字符集 ISO-8859-1 可以接受,但质量较低。

When you say the file name is garbled, is it garbled in the HttpServletRequest.getParameter's return value?

说文件名乱码,是不是在HttpServletRequest.getParameter的返回值里面乱码了?

回答by paulmurray

You do not use UTF-8 to encode text data for HTML forms. The html standard defines two encodings, and the relevant part of that standard is here. The "old" encoding, than handles ascii, is application/x-www-form-urlencoded. The new one, that works properly, is multipart/form-data.

您不使用 UTF-8 来编码 HTML 表单的文本数据。html 标准定义了两种编码,该标准的相关部分在这里。“旧”编码,而不是处理 ascii,是 application/x-www-form-urlencoded。新的可以正常工作的是 multipart/form-data。

Specifically, the form declaration looks like this:

具体来说,表单声明如下所示:

 <FORM action="http://server.com/cgi/handle"
       enctype="multipart/form-data"
       method="post">
   <P>
   What is your name? <INPUT type="text" name="submit-name"><BR>
   What files are you sending? <INPUT type="file" name="files"><BR>
   <INPUT type="submit" value="Send"> <INPUT type="reset">
 </FORM>

And I think that's all you have to worry about - the webserver should handle it. If you are writing something that directly reads the InputStream from the web client, then you will need to read RFC 2045and RFC 2046.

我认为这就是你所需要担心的——网络服务器应该处理它。如果您正在编写直接从 Web 客户端读取 InputStream 的内容,那么您将需要阅读RFC 2045RFC 2046

回答by Dan

I had the same problem and it turned out that in addition to specifying the encoding in the Filter

我遇到了同样的问题,结果证明除了在过滤器中指定编码

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

it is necessary to add "acceptcharset" to the form

有必要在表单中添加“acceptcharset”

<form method="post" enctype="multipart/form-data" acceptcharset="UTF-8" > 

and run the JVMwith

并且运行在JVM

-Dfile.encoding=UTF-8

The HTML meta tag is not necessary if you send it in the HTTP header using response.setCharacterEncoding().

如果您使用 response.setCharacterEncoding() 在 HTTP 标头中发送 HTML 元标记,则不需要它。

回答by nautilusvn

Just use Apache commons upload library. Add URIEncoding="UTF-8"to Tomcat's connector, and use FileItem.getString("UTF-8") instead of FileItem.getString() without charset specified.

只需使用 Apache 公共上传库。添加URIEncoding="UTF-8"到 Tomcat 的连接器,并使用 FileItem.getString("UTF-8") 而不是 FileItem.getString() 没有指定字符集。

Hope this help.

希望这有帮助。

回答by Roger Keays

I got stuck with this problem and found that it was the order of the call to

我被这个问题困住了,发现这是调用的顺序

request.setCharacterEncoding("UTF-8");

that was causing the problem. It has to be called before any all call to request.getParameter(), so I made a special filter to use at the top of my filter chain.

那是造成问题的原因。它必须在对 request.getParameter() 的所有调用之前调用,因此我制作了一个特殊的过滤器以在过滤器链的顶部使用。

http://www.ninthavenue.com.au/servletrequest-setcharactercoding-ignored

http://www.ninthavenue.com.au/servletrequest-setcharactercoding-ignored

回答by Weles

I am using Primefaces with glassfish and SQL Server.

我正在使用带有 glassfish 和 SQL Server 的 Primefaces。

in my case i created the Webfilter, in back-end, to get every request and convert to UTF-8, like this:

就我而言,我在后端创建了 Webfilter,以获取每个请求并转换为 UTF-8,如下所示:

package br.com.teste.filter;

import java.io.IOException;

import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebFilter;

@WebFilter(servletNames={"Faces Servlet"})
public class Filter implements javax.servlet.Filter {

    @Override
    public void destroy() {
        // TODO Auto-generated method stub

    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response,
            FilterChain chain) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);      
    }

    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // TODO Auto-generated method stub      
    }

}

In the View (.xhtml) i need to set the enctype paremeter's form to UTF-8 like @Kevin Rahe:

在视图 (.xhtml) 中,我需要像 @Kevin Rahe 一样将 enctype 参数表的形式设置为 UTF-8:

    <h:form id="frmt" enctype="multipart/form-data;charset=UTF-8" >
         <!-- your code here -->
    </h:form>  

回答by Rognvald Eaversen

In case someone stumbled upon this problem when working on Grails (or pure Spring) web application, here is the post that helped me:

如果有人在使用 Grails(或纯 Spring)Web 应用程序时偶然发现了这个问题,以下是对我有帮助的帖子:

http://forum.spring.io/forum/spring-projects/web/2491-solved-character-encoding-and-multipart-forms

http://forum.spring.io/forum/spring-projects/web/2491-solved-character-encoding-and-multipart-forms

To set default encoding to UTF-8 (instead of the ISO-8859-1) for multipart requests, I added the following code in resources.groovy (Spring DSL):

为了将多部分请求的默认编码设置为 UTF-8(而不是 ISO-8859-1),我在 resources.groovy(Spring DSL)中添加了以下代码:

multipartResolver(ContentLengthAwareCommonsMultipartResolver) {
    defaultEncoding = 'UTF-8'
}

回答by Gy?rgy Novák

I'm using org.apache.commons.fileupload.servlet.ServletFileUpload.ServletFileUpload(FileItemFactory)and defining the encoding when reading out parameter value:

org.apache.commons.fileupload.servlet.ServletFileUpload.ServletFileUpload(FileItemFactory)在读出参数值时使用和定义编码:

List<FileItem> items = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);

for (FileItem item : items) {
    String fieldName = item.getFieldName();

    if (item.isFormField()) {
        String fieldValue = item.getString("UTF-8"); // <-- HERE