Java HTML：表单不发送 UTF-8 格式的输入

Question

提问by Yassin Hajaj

I've visited each one of the questions about UTF-8 encoding in HTML and nothing seems to be making it work like expected.

我已经访问了关于 HTML 中 UTF-8 编码的每一个问题，但似乎没有任何事情使它像预期的那样工作。

I added the metatag : nothing changed.
I added the accept-charsetattribute in form: nothing changed.

我添加了meta标签：没有任何改变。
我accept-charset在form: 中添加了属性，没有任何改变。

JSP File

JSP文件

<%@ page pageEncoding="UTF-8" %>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>Editer les sous-titres</title>
</head>
<body>
    <form method="post" action="/Subtitlor/edit" accept-charset="UTF-8"> 
        <h3 name="nameOfFile"><c:out value="${ nameOfFile }"/></h3> 
        <input type="hidden" name="nameOfFile" id="nameOfFile" value="${ nameOfFile }"/>
        <c:if test="${ !saved }">
            <input value ="Enregistrer le travail" type="submit" style="position:fixed; top: 10px; right: 10px;" />
        </c:if>
        <a href="/Subtitlor/" style="position:fixed; top: 50px; right: 10px;">Retour à la page d'accueil</a>
        <c:if test="${ saved }">
            <div style="position:fixed; top: 90px; right: 10px;">
                <c:out value="Travail enregistré dans la base de donnée"/>
            </div>
        </c:if>
        <table border="1">
            <c:if test="${ !saved }">
                <thead>
                    <th style="weight:bold">Original Line</th>
                    <th style="weight:bold">Translation</th>
                    <th style="weight:bold">Already translated</th>
                </thead>
            </c:if>
            <c:forEach items="${ subtitles }" var="line" varStatus="status">
                <tr>
                    <td style="text-align:right;"><c:out value="${ line }" /></td>
                    <td><input type="text" name="line${ status.index }" id="line${ status.index }" size="35" /></td>
                    <td style="text-align:right"><c:out value="${ lines[status.index].content }"/></td>
                </tr>
            </c:forEach>
        </table>
    </form>
</body>
</html>

Servlet

小服务程序

for (int i = 0 ; i < 2; i++){
    System.out.println(request.getParameter("line"+i));
}

Output

输出

Et ton p?¨re et sa soeur
Il ne sera jamais parti.

Answer 1

采纳答案by BalusC

I added the metatag : nothing changed.

我添加了meta标签：没有任何改变。

It indeed doesn't have any effect when the page is served over HTTP instead of e.g. from local disk file system (i.e. the page's URL is http://...instead of e.g. file://...). In HTTP, the charset in HTTP response header will be used. You've already set it as below:

当页面通过 HTTP 而不是来自本地磁盘文件系统（即页面的 URLhttp://...而不是 eg file://...）提供时，它确实没有任何影响。在 HTTP 中，将使用 HTTP 响应头中的字符集。您已经将其设置如下：

<%@page pageEncoding="UTF-8"%>

This will not only write out the HTTP response using UTF-8, but also set the charsetattribute in the Content-Typeresponse header.

这不仅会使用 UTF-8 写出 HTTP 响应，还会charset在Content-Type响应头中设置属性。

This one will be used by the webbrowser to interpret the response and encode any HTML form params.

webbrowser 将使用这个来解释响应并编码任何 HTML 表单参数。

I added the accept-charsetattribute in form: nothing changed.

我accept-charset在form: 中添加了属性，没有任何改变。

It has only effect in Microsoft Internet Explorer browser. Even then it is doing it wrongly. Never use it. All real webbrowsers will instead use the charsetattribute specified in the Content-Typeheader of the response. Even MSIE will do it the right way as long as you do notspecify the accept-charsetattribute. As said before, you have already properly set it via pageEncoding.

它仅在 Microsoft Internet Explorer 浏览器中有效。即使那样，它也做错了。永远不要使用它。所有真实的网络浏览器都将使用响应标头中charset指定的属性Content-Type。只要您不指定accept-charset属性，即使 MSIE 也会以正确的方式执行此操作。如前所述，您已经通过pageEncoding.

Get rid of both the metatag and accept-charsetattribute. They do not have any useful effect and they will only confuse yourself in long term and even make things worse when enduser uses MSIE. Just stick to pageEncoding. Instead of repeating the pageEncodingover all JSP pages, you could also set it globally in web.xmlas below:

去掉meta标签和accept-charset属性。它们没有任何有用的效果，它们只会长期混淆自己，甚至在最终用户使用 MSIE 时使情况变得更糟。只要坚持pageEncoding。除了重复pageEncoding所有 JSP 页面之外，您还可以将其全局设置web.xml如下：

<jsp-config>
    <jsp-property-group>
        <url-pattern>*.jsp</url-pattern>
        <page-encoding>UTF-8</page-encoding>
    </jsp-property-group>
</jsp-config>

As said, this will tell the JSP engine to write HTTP response output using UTF-8 and set it in the HTTP response header too. The webbrowser will use the same charset to encode the HTTP request parameters before sending back to server.

如上所述，这将告诉 JSP 引擎使用 UTF-8 编写 HTTP 响应输出并将其设置在 HTTP 响应标头中。浏览器将使用相同的字符集在发送回服务器之前对 HTTP 请求参数进行编码。

Your only missing step is to tell the server that it must use UTF-8 to decode the HTTP request parameters before returning in getParameterXxx()calls. How to do that globally depends on the HTTP request method. Given that you're using POST method, this is relatively easy to achieve with the below servlet filter class which automatically hooks on all requests:

您唯一缺少的步骤是告诉服务器在返回getParameterXxx()调用之前它必须使用 UTF-8 来解码 HTTP 请求参数。如何在全局范围内执行此操作取决于 HTTP 请求方法。鉴于您使用的是 POST 方法，使用以下自动挂钩所有请求的 servlet 过滤器类相对容易实现：

@WebFilter("/*")
public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig config) throws ServletException {
        // NOOP.
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    @Override
    public void destroy() {
        // NOOP.
    }
}

That's all. In Servlet 3.0+ (Tomcat 7 and newer) you don't need additional web.xmlconfiguration.

就这样。在 Servlet 3.0+（Tomcat 7 及更新版本）中，您不需要额外的web.xml配置。

You only need to keep in mind that it's very important that setCharacterEncoding()method is called beforethe POST request parameters are obtained for the first time using any of getParameterXxx()methods. This is because they are parsed only once on first access and then cached in server memory.

您只需要记住，在第一次使用任何方法获取 POST 请求参数之前setCharacterEncoding()调用该方法是非常重要的。这是因为它们在第一次访问时只解析一次，然后缓存在服务器内存中。getParameterXxx()

So e.g. below sequence is wrong:

所以例如下面的顺序是错误的：

String foo = request.getParameter("foo"); // Wrong encoding.
// ...
request.setCharacterEncoding("UTF-8"); // Attempt to set it.
String bar = request.getParameter("bar"); // STILL wrong encoding!

Doing the setCharacterEncoding()job in a servlet filter will guarantee that it runs timely (at least, before any servlet).

setCharacterEncoding()在 servlet 过滤器中完成这项工作将保证它及时运行（至少在任何 servlet 之前）。

In case you'd like to instruct the server to decode GET (not POST) request parameters using UTF-8 too (those parameters you see after ?character in URL, you know), then you'd basically need to configure it in the server end. It's not possible to configure it via servlet API. In case you're using for example Tomcat as server, then it's a matter of adding URIEncoding="UTF-8"attribute in <Connector>element of Tomcat's own /conf/server.xml.

如果您也想指示服务器使用 UTF-8 解码 GET（而不是 POST）请求参数（您?在 URL 中的字符后面看到的那些参数，您知道），那么您基本上需要在服务器中配置它结尾。无法通过 servlet API 配置它。如果您使用例如 Tomcat 作为服务器，则需要URIEncoding="UTF-8"在<Connector>Tomcat 自己的元素中添加属性/conf/server.xml。

In case you're still seeing Mojibakein the console output of System.out.println()calls, then chances are big that the stdout itself is not configured to use UTF-8. How to do that depends on who's responsible for interpreting and presenting the stdout. In case you're using for example Eclipse as IDE, then it's a matter of setting Window > Preferences > General > Workspace > Text File Encodingto UTF-8.

如果您仍然在调用的控制台输出中看到MojibakeSystem.out.println()，那么标准输出本身未配置为使用 UTF-8 的可能性很大。如何做到这一点取决于谁负责解释和呈现标准输出。如果您使用例如 Eclipse 作为 IDE，那么只需将Window > Preferences > General > Workspace > Text File Encoding 设置为 UTF-8。

也可以看看：

Unicode - How to get the characters right?

Unicode - 如何正确获取字符？

Answer 2

回答by IndTechVJ

You can use Strings related to ISO in your charset and pageEncoding definations in your JSP code.

您可以在字符集中使用与 ISO 相关的字符串，并在 JSP 代码中使用 pageEncoding 定义。

Like charset="ISO-8859-1" and pageEncoding="ISO-8859-1".

像 charset="ISO-8859-1" 和 pageEncoding="ISO-8859-1"。

Answer 3

回答by SubOptimal

Based on your posted output it seems that the parameter is sent as UTF8 and later the unicode bytes of the string are interpreted as ISO-8859-1.

根据您发布的输出，该参数似乎是作为 UTF8 发送的，后来字符串的 unicode 字节被解释为 ISO-8859-1。

Following snippet demonstrates your observed behavior

以下代码段展示了您观察到的行为

String eGrave = "\u00E8"; // the letter è
System.out.printf("letter UTF8      : %s%n", eGrave);
byte[] bytes = eGrave.getBytes(StandardCharsets.UTF_8);
System.out.printf("UTF-8 hex        : %X %X%n",
        bytes[0], bytes[1], bytes[0], bytes[1]
);
System.out.printf("letter ISO-8859-1: %s%n",
        new String(bytes, StandardCharsets.ISO_8859_1)
);

output

输出

letter UTF8      : è
UTF-8 hex        : C3 A8
letter ISO-8859-1: ?¨

For me the form send the correct UTF8 encoded data, but later this data is not treated as UTF8.

对我来说，表单发送正确的 UTF8 编码数据，但后来这些数据不被视为 UTF8。

editSome other points to try:

编辑其他一些尝试：

output the character encoding your request has

输出您的请求具有的字符编码

System.out.println(request.getCharacterEncoding())

force the usage of UTF-8 to retrieve the parameter (untested, only an idea)

强制使用 UTF-8 来检索参数（未经测试，只是一个想法）

request.setCharacterEncoding("UTF-8");
... request.getParameter(...);

Answer 4

回答by hagrawal

Warm up

暖身

Let me start by saying the universal fact which we all know that computer doesn't understand anything but bits - 0's and 1's.

首先让我说一个普遍的事实，即我们都知道计算机除了位 - 0 和 1 之外什么都不理解。

Now, when you are submitting a HTML form over HTTP and values travel over the wire to reach destination server then essentially a whole lot of bits - 0's and 1's are being passed over.

现在，当您通过 HTTP 提交 HTML 表单并且值通过网络传输到达目标服务器时，基本上会传递很多位 - 0 和 1。

Before sending the data to the server, HTTP client (browser or curl etc.) will encode it using some encoding scheme and expects server to decode it using same scheme so that server knows exactly what client has sent.
Before sending response back to the client, server will encode it using some encoding scheme and expects client to decode it using same scheme so that client knows exactly what server has sent.

在将数据发送到服务器之前，HTTP 客户端（浏览器或 curl 等）将使用某种编码方案对其进行编码，并期望服务器使用相同的方案对其进行解码，以便服务器确切地知道客户端发送了什么。
在将响应发送回客户端之前，服务器将使用某种编码方案对其进行编码，并期望客户端使用相同的方案对其进行解码，以便客户端确切地知道服务器发送了什么。

An analogyfor this can be - I am sending a letter to you and telling you whether it is written in English or French or Dutch, so that you will get exact message as I intended to send you. And while replying to me you will also mention in which language I should read.

一个类比可以是 - 我正在给你写一封信，告诉你它是用英语、法语还是荷兰语写的，这样你就会得到我打算发送给你的确切信息。在回复我时，您还会提到我应该用哪种语言阅读。

Important take awayis that the fact that when data is leaving the client it will be encoded and same will be decoded at server side, and vice-versa. If you do not specify anything then content will be encoded as per application/x-www-form-urlencodedbefore leaving from client side to server side.

重要的一点是，当数据离开客户端时，它会被编码，同样会在服务器端被解码，反之亦然。如果您没有指定任何内容，那么在从客户端离开到服务器端之前，内容将按照application/x-www-form-urlencoded进行编码。

Core concept

核心理念

Reading warm up is important.There are couple of things you need to make sure to get the expected results.

阅读热身很重要。您需要做几件事来确保获得预期的结果。

Having correct encoding set before sending data from client to server.
Having correct decoding and encoding set at server side to read request and write response back to client (this was the reason why you were not getting expected results)
Ensure that everywhere same encoding scheme is used, it should not happen that at client you are encoding using ISO-8859-1 and at server you are decoding using UTF-8, else there will be goof-up (from my analogy, I am writing you in English and you are reading in French)
Having correct encoding set for your logs viewer, if trying to verify using log using Windows command-line or Eclipse log viewer etc. (this was a contributing reason for your issue but it was not primary reason because in the first place your data read from request object was not correctly decoded. windows cmd or Eclipse log viewer encoding also matters, readhere)

在从客户端向服务器发送数据之前设置正确的编码。
在服务器端设置正确的解码和编码以读取请求并将响应写回客户端（这就是您没有获得预期结果的原因）
确保在任何地方都使用相同的编码方案，不应该发生在客户端使用 ISO-8859-1 编码而在服务器使用 UTF-8 解码的情况，否则会出现错误（从我的类比中，我是用英语给你写信，而你正在用法语阅读）
为您的日志查看器设置正确的编码，如果尝试使用 Windows 命令行或 Eclipse 日志查看器等验证使用日志（这是导致您的问题的一个原因，但不是主要原因，因为首先您的数据是从请求对象未正确解码。windows cmd 或 Eclipse 日志查看器编码也很重要，请在此处阅读）

Having correct encoding set before sending data from client to server

在从客户端向服务器发送数据之前设置正确的编码

To ensure this, there are several ways talked about but I will say use HTTP Accept-Charset request-header field. As per your provided code snippet you are already using and using it correctly so you are good from that front.

为了确保这一点，讨论了几种方法，但我会说使用HTTP Accept-Charset request-header field。根据您提供的代码片段，您已经在使用并正确使用它，因此您在这方面做得很好。

There are people who will say that do not use this or it is not implemented but I would very humbly disagree with them. Accept-Charsetis part of HTTP 1.1 specification (I have provided link) and browser implementing HTTP 1.1 will implement the same. They may also argue that use Accept request-header field's"charset" attribute but

有些人会说不要使用它或者它没有实施，但我非常谦虚地不同意他们的观点。Accept-Charset是 HTTP 1.1 规范的一部分（我提供了链接），实现 HTTP 1.1 的浏览器也将实现相同的内容。他们也可能会争辩说使用Accept request-header 字段的“charset”属性但是

Really it is not present, check the Accept request-header field link I provided.
Check this

确实它不存在，请检查我提供的 Accept request-header 字段链接。
检查这个

I am providing you all data and facts, not just words, but still if you are not satisfied then do following tests using different browsers.

我为您提供所有数据和事实，而不仅仅是文字，但如果您不满意，请使用不同的浏览器进行以下测试。

Set accept-charset="ISO-8859-1"in your HTML form and POST/GET form having Chinese or advanced French characters to server.
At server decode the data using UTF-8 scheme.
Now repeat same test by swapping client and server encoding.

将accept-charset="ISO-8859-1"您的 HTML 表单和具有中文或高级法语字符的 POST/GET 表单设置到服务器。
在服务器上使用 UTF-8 方案解码数据。
现在通过交换客户端和服务器编码重复相同的测试。

You will see that none of times you were able to see the expected characters at server. But if you will use same encoding scheme then you will see expected character. So, browsers do implements accept-charsetand its effect kicks-in.

您将看到，您没有一次能够在服务器上看到预期的字符。但是，如果您将使用相同的编码方案，那么您将看到预期的字符。所以，浏览器确实实现了accept-charset它的效果。

Having correct decoding and encoding set at server side to read request and write response back to client

在服务器端设置正确的解码和编码以读取请求并将响应写回客户端

There are hell lot of ways talked about that you can do to achieve this (sometime some configuration may be required based on specific scenario but below solves 95% cases and holds good for your case as well). For example:

有很多方法可以用来实现这一点（有时可能需要根据特定场景进行一些配置，但以下解决了 95% 的情况并且也适用于您的情况）。例如：

Use character encoding filter for setting encoding on request and response.
Use setCharacterEncodingon request and response
Configure web or application server for correct character encoding using -Dfile.encoding=utf8etc. Read more here
Etc.

使用字符编码过滤器设置请求和响应的编码。
用于setCharacterEncoding请求和响应
使用-Dfile.encoding=utf8等配置 Web 或应用程序服务器以进行正确的字符编码。在此处阅读更多信息
等等。

My favorite is first one and will solve your problem as well - "Character Encoding Filter", because of below reasons:

我最喜欢的是第一个，它也可以解决您的问题 - “字符编码过滤器”，原因如下：

All you encoding handling logic is at one place.
You have all the power through configuration, change at one place and everyone if happy.
You need not to worry that some other code may be reading my request stream or flushing out the response stream before I could set the character encoding.

您所有的编码处理逻辑都集中在一个地方。
通过配置，您拥有所有权力，在一个地方进行更改，每个人都开心。
您不必担心其他一些代码可能会在我设置字符编码之前读取我的请求流或刷新响应流。

1. Character encoding filter

1.字符编码过滤器

You can do following to implement your own character encoding filter. If you are using some framework like Springs etc. then you need not to write you own class but just do the configuration in web.xml

您可以执行以下操作来实现您自己的字符编码过滤器。如果您使用的是 Springs 等框架，那么您无需编写自己的类，只需在 web.xml 中进行配置即可

Core logic in below is very similar to what Spring does, apart from a lot of dependency, bean aware thing they do.

下面的核心逻辑与 Spring 所做的非常相似，除了它们所做的很多依赖项和 bean 感知的事情。

web.xml(configuration)

web.xml（配置）

<filter>
    <filter-name>EncodingFilter</filter-name>
    <filter-class>
        com.sks.hagrawal.EncodingFilter
    </filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
    <init-param>
        <param-name>forceEncoding</param-name>
        <param-value>true</param-value>
    </init-param>
</filter>

<filter-mapping>
    <filter-name>EncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

EncodingFilter(character encoding implementation class)

EncodingFilter（字符编码实现类）

public class EncodingFilter implements Filter {
    private String encoding = "UTF-8";
    private boolean forceEncoding = false;

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException {
        request.setCharacterEncoding(encoding);
        if(forceEncoding){ //If force encoding is set then it means that set response stream encoding as well ...
            response.setCharacterEncoding(encoding);
        }
        filterChain.doFilter(request, response);
    }

    public void init(FilterConfig filterConfig) throws ServletException {
        String encodingParam = filterConfig.getInitParameter("encoding");
        String forceEncoding = filterConfig.getInitParameter("forceEncoding");
        if (encodingParam != null) {
            encoding = encodingParam;
        }
        if (forceEncoding != null) {
            this.forceEncoding = Boolean.valueOf(forceEncoding);
        }
    }

    @Override
    public void destroy() {
        // TODO Auto-generated method stub

    }
}

2. ServletRequest.setCharacterEncoding()

This is essentially same code done in character encoding filter but instead of doing in filter, you are doing it in your servlet or controller class.

这与在字符编码过滤器中完成的代码基本相同，但不是在过滤器中进行，而是在 servlet 或控制器类中进行。

Idea is again to use request.setCharacterEncoding("UTF-8");to set the encoding of http request stream before you start reading the http request stream.

Idea 再次用于request.setCharacterEncoding("UTF-8");在开始读取 http 请求流之前设置 http 请求流的编码。

Try below code, and you will see that if you are not using some sort of filter to set the encoding on request object then first log will be NULL while second log will be "UTF-8".

试试下面的代码，你会看到，如果你没有使用某种过滤器来设置请求对象的编码，那么第一个日志将为 NULL，而第二个日志将为“UTF-8”。

System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
request.setCharacterEncoding("UTF-8");
System.out.println("CharacterEncoding = " + request.getCharacterEncoding());

Below is important excerpt from setCharacterEncoding Java docs. Another thing to note is you should provide a valid encoding scheme else you will get UnsupportedEncodingException

以下是setCharacterEncoding Java 文档的重要摘录。另一件要注意的事情是你应该提供一个有效的编码方案，否则你会得到UnsupportedEncodingException

Overridesthe name of the character encoding used in the body of this request. This method must be called prior to reading request parameters or reading input using getReader(). Otherwise, it has no effect.

覆盖此请求正文中使用的字符编码的名称。必须在使用 getReader() 读取请求参数或读取输入之前调用此方法。否则，它没有任何作用。

Wherever needed I have tried best to provide you official links or StackOverflow accepted bounty answers, so that you can build trust.

在任何需要的地方，我都尽力为您提供官方链接或 StackOverflow 接受的赏金答案，以便您建立信任。

Answer 5

回答by Grim

There is a bug in tomcat that may trapped you. The first-filter defines the encoding the request is based on.

tomcat 中有一个错误可能会困住您。第一个过滤器定义请求所基于的编码。

Every other filter or servlet behind the first-filter can not change the encoding of the request anymore.

第一个过滤器后面的每个其他过滤器或 servlet 不能再更改请求的编码。

I do not think this bug will be fixed in the future because the current applications may rely on the encoding.

我不认为这个错误会在未来修复，因为当前的应用程序可能依赖于编码。

Answer 6

回答by nicowtt

You can try to write that on .jsp:

你可以尝试在 .jsp 上写：

<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
         pageEncoding="UTF-8"%>

problem resolved for me with that.

问题解决了我。

Java HTML：表单不发送 UTF-8 格式的输入

提问by Yassin Hajaj

JSP File

JSP文件

Servlet

小服务程序

Output

输出

采纳答案by BalusC

See also:

也可以看看：

回答by IndTechVJ

回答by SubOptimal

回答by hagrawal

Warm up

暖身

Core concept

核心理念

Having correct encoding set before sending data from client to server

在从客户端向服务器发送数据之前设置正确的编码

Having correct decoding and encoding set at server side to read request and write response back to client

在服务器端设置正确的解码和编码以读取请求并将响应写回客户端

1. Character encoding filter

1.字符编码过滤器

2. ServletRequest.setCharacterEncoding()

2. ServletRequest.setCharacterEncoding()

回答by Grim

回答by nicowtt

相关推荐

最近更新

标签

Java HTML：表单不发送 UTF-8 格式的输入

提问by Yassin Hajaj

JSP File

JSP文件

Servlet

小服务程序

Output

输出

采纳答案by BalusC

See also:

也可以看看：

回答by IndTechVJ

回答by SubOptimal

回答by hagrawal

Warm up

暖身

Core concept

核心理念

Having correct encoding set before sending data from client to server

在从客户端向服务器发送数据之前设置正确的编码

Having correct decoding and encoding set at server side to read request and write response back to client

在服务器端设置正确的解码和编码以读取请求并将响应写回客户端

1. Character encoding filter

1.字符编码过滤器

2. ServletRequest.setCharacterEncoding()

2. ServletRequest.setCharacterEncoding()

回答by Grim

回答by nicowtt

相关推荐

Java 使用 Spring 休眠二级缓存

javax.net.ssl.SSLHandshakeException：sun.security.validator.ValidatorException：PKIX 路径构建失败

Java int[][] 数组 - 迭代和查找值

Java NetBeans 8.1 激活失败

相关推荐

最近更新

标签