java Tomcat自动检测URI编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2657515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 22:15:01  来源:igfitidea点击:

Detect the URI encoding automatically in Tomcat

javatomcatcharacter-encodingservlet-filtersurl-encoding

提问by Roland Illig

I have an instance of Apache Tomcat 6.x running, and I want it to interpret the character set of incoming URLs a little more intelligent than the default behavior. In particular, I want to achieve the following mapping:

我有一个 Apache Tomcat 6.x 实例正在运行,我希望它比默认行为更智能地解释传入 URL 的字符集。特别是,我想实现以下映射:

So%DFe => So?e
So%C3%9Fe => So?e
So%DF%C3%9F => (error)

The bevavior I want could be described as "try to decode the byte stream as UTF-8, and if it doesn't work assume ISO-8859-1".

我想要的行为可以描述为“尝试将字节流解码为 UTF-8,如果它不起作用,则假设为 ISO-8859-1”。

Simply using the URIEncodingconfiguration doesn't work in that case. So how can I configure Tomcat to encode the request the way I want?

URIEncoding在这种情况下,简单地使用配置是行不通的。那么如何配置 Tomcat 以按照我想要的方式对请求进行编码?

I might have to write a Filter that takes the request (especially the query string) and re-encodes it into the parameters. Would that be the natural way?

我可能必须编写一个过滤器来接受请求(尤其是查询字符串)并将其重新编码为参数。那会是自然的方式吗?

回答by Roland Illig

The complicated way to achieve my goal was indeed to write my own javax.servlet.Filterand to embed it into the filter chain. This solution complies with the Apache Tomcat suggestion provided in Tomcat Wiki - Character Encoding Issues.

实现我的目标的复杂方法确实是自己编写javax.servlet.Filter并将其嵌入到过滤器链中。此解决方案符合Tomcat Wiki-Character Encoding Issues 中提供的 Apache Tomcat 建议。

Update (2010-07-31):The first version of this filter interpreted the query string itself, which was a bad idea. It didn't handle POSTrequests correctly and had problems when combined with other servlet filters like for URL-rewriting. This version instead wraps the originally provided parameters and recodes them. To make it work correctly, the URIEncoding(for example in Tomcat) must be configured to be ISO-8859-1.

更新 (2010-07-31):这个过滤器的第一个版本解释了查询字符串本身,这是一个坏主意。它没有POST正确处理请求,并且在与其他 servlet 过滤器(如 URL 重写)结合使用时会出现问题。这个版本改为包装最初提供的参数并重新编码它们。要使其正常工作,URIEncoding(例如在 Tomcat 中)必须配置为ISO-8859-1.

package de.roland_illig.webapps.webapp1;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.IllegalCharsetNameException;
import java.nio.charset.UnsupportedCharsetException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Enumeration;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import javax.servlet.http.HttpServletResponse;

/**
 * Automatically determines the encoding of the request parameters. It assumes
 * that the parameters of the original request are encoded by a 1:1 mapping from
 * bytes to characters.
 * <p>
 * If the request parameters cannot be decoded by any of the given encodings,
 * the filter chain is not processed further, but a status code of 400 with a
 * helpful error message is returned instead.
 * <p>
 * The filter can be configured using the following parameters:
 * <ul>
 * <li>{@code encodings}: The comma-separated list of encodings (see
 * {@link Charset#forName(String)}) that are tried in order. The first one that
 * can decode the complete query string is taken.
 * <p>
 * Default value: {@code UTF-8}
 * <p>
 * Example: {@code UTF-8,EUC-KR,ISO-8859-15}.
 * <li>{@code inputEncodingParameterName}: When this parameter is defined and a
 * query parameter of that name is provided by the client, and that parameter's
 * value contains only non-escaped characters and the server knows an encoding
 * of that name, then it is used exclusively, overriding the {@code encodings}
 * parameter for this request.
 * <p>
 * Default value: {@code null}
 * <p>
 * Example: {@code ie} (as used by Google).
 * </ul>
 */
public class EncodingFilter implements Filter {

  private static final Pattern PAT_COMMA = Pattern.compile(",\s*");

  private String inputEncodingParameterName = null;
  private final List<Charset> encodings = new ArrayList<Charset>();

  @Override
  @SuppressWarnings("unchecked")
  public void init(FilterConfig config) throws ServletException {
    String encodingsStr = "UTF-8";

    Enumeration<String> en = config.getInitParameterNames();
    while (en.hasMoreElements()) {
      final String name = en.nextElement();
      final String value = config.getInitParameter(name);
      if (name.equals("encodings")) {
        encodingsStr = value;
      } else if (name.equals("inputEncodingParameterName")) {
        inputEncodingParameterName = value;
      } else {
        throw new IllegalArgumentException("Unknown parameter: " + name);
      }
    }

    for (String encoding : PAT_COMMA.split(encodingsStr)) {
      Charset charset = Charset.forName(encoding);
      encodings.add(charset);
    }
  }

  @SuppressWarnings("unchecked")
  @Override
  public void doFilter(ServletRequest sreq, ServletResponse sres, FilterChain fc) throws IOException, ServletException {
    final HttpServletRequest req = (HttpServletRequest) sreq;
    final HttpServletResponse res = (HttpServletResponse) sres;

    final Map<String, String[]> params;
    try {
      params = Util.decodeParameters(req.getParameterMap(), encodings, inputEncodingParameterName);
    } catch (IOException e) {
      res.sendError(400, e.getMessage());
      return;
    }

    HttpServletRequest wrapper = new ParametersWrapper(req, params);
    fc.doFilter(wrapper, res);
  }

  @Override
  public void destroy() {
    // nothing to do
  }

  static abstract class Util {

    static CharsetDecoder strictDecoder(Charset cs) {
      CharsetDecoder dec = cs.newDecoder();
      dec.onMalformedInput(CodingErrorAction.REPORT);
      dec.onUnmappableCharacter(CodingErrorAction.REPORT);
      return dec;
    }

    static int[] toCodePoints(String str) {
      final int len = str.length();
      int[] codePoints = new int[len];
      int i = 0, j = 0;
      while (i < len) {
        int cp = Character.codePointAt(str, i);
        codePoints[j++] = cp;
        i += Character.charCount(cp);
      }
      return j == len ? codePoints : Arrays.copyOf(codePoints, len);
    }

    public static String recode(String encoded, CharsetDecoder decoder) throws IOException {
      byte[] bytes = new byte[encoded.length()];
      int bytescount = 0;

      for (int i = 0; i < encoded.length(); i++) {
        char c = encoded.charAt(i);
        if (!(c <= '\u00FF'))
          throw new IOException("Invalid character: #" + (int) c);
        bytes[bytescount++] = (byte) c;
      }

      CharBuffer cbuf = decoder.decode(ByteBuffer.wrap(bytes, 0, bytescount));
      String result = cbuf.toString();
      return result;
    }

    static String ensureDefinedUnicode(String s) throws IOException {
      for (int cp : toCodePoints(s)) {
        if (!Character.isDefined(cp))
          throw new IOException("Undefined unicode code point: " + cp);
      }
      return s;
    }

    static Map<String, String[]> decodeParameters(Map<String, String[]> originalParams, List<Charset> charsets, String ieName) throws IOException {
      Map<String, String[]> params = new LinkedHashMap<String, String[]>();

      Charset ie = null;
      {
        String[] values = originalParams.get(ieName);
        if (values != null) {
          for (String value : values) {
            if (!value.isEmpty() && value.indexOf('%') == -1) {
              try {
                if (ie != null)
                  throw new IOException("Duplicate value for input encoding parameter: " + ie + " and " + value + ".");
                ie = Charset.forName(value);
              } catch (IllegalCharsetNameException e) {
                throw new IOException("Illegal input encoding name: " + value);
              } catch (UnsupportedCharsetException e) {
                throw new IOException("Unsupported input encoding: " + value);
              }
            }
          }
        }
      }

      Charset[] css = (ie != null) ? new Charset[] { ie } : charsets.toArray(new Charset[charsets.size()]);
      for (Charset charset : css) {
        try {
          params.clear();
          CharsetDecoder decoder = strictDecoder(charset);
          for (Map.Entry<String, String[]> entry : originalParams.entrySet()) {
            final String encodedName = entry.getKey();
            final String name = ensureDefinedUnicode(Util.recode(encodedName, decoder));
            for (final String encodedValue : entry.getValue()) {
              final String value = ensureDefinedUnicode(Util.recode(encodedValue, decoder));
              String[] oldValues = params.get(name);
              String[] newValues = (oldValues == null) ? new String[1] : Arrays.copyOf(oldValues, oldValues.length + 1);
              newValues[newValues.length - 1] = value;
              params.put(name, newValues);
            }
          }
          return params;
        } catch (IOException e) {
          continue;
        }
      }

      List<String> kvs = new ArrayList<String>();
      for (Map.Entry<String, String[]> entry : originalParams.entrySet()) {
        final String key = entry.getKey();
        for (final String value : entry.getValue()) {
          kvs.add(key + "=" + value);
        }
      }
      throw new IOException("Could not decode the parameters: " + kvs.toString());
    }
  }

  @SuppressWarnings("unchecked")
  static class ParametersWrapper extends HttpServletRequestWrapper {

    private final Map<String, String[]> params;

    public ParametersWrapper(HttpServletRequest request, Map<String, String[]> params) {
      super(request);
      this.params = params;
    }

    @Override
    public String getParameter(String name) {
      String[] values = params.get(name);
      return (values != null && values.length != 0) ? values[0] : null;
    }

    @Override
    public Map getParameterMap() {
      return Collections.unmodifiableMap(params);
    }

    @Override
    public Enumeration getParameterNames() {
      return Collections.enumeration(params.keySet());
    }

    @Override
    public String[] getParameterValues(String name) {
      return params.get(name);
    }
  }
}

While the code size is reasonably small, there are some implementation details that one can get wrong, so I would have expected that Tomcat already delivers a similar filter.

虽然代码相当小,但有些实现细节可能会出错,所以我预计 Tomcat 已经提供了类似的过滤器。

To activate this filter, I have added the following to my web.xml:

要激活此过滤器,我已将以下内容添加到我的web.xml:

<filter>
  <filter-name>EncodingFilter</filter-name>
  <filter-class>de.roland_illig.webapps.webapp1.EncodingFilter</filter-class>
  <init-param>
    <param-name>encodings</param-name>
    <param-value>US-ASCII, UTF-8, EUC-KR, ISO-8859-15, ISO-8859-1</param-value>
  </init-param>
  <init-param>
    <param-name>inputEncodingParameterName</param-name>
    <param-value>ie</param-value>
  </init-param>
</filter>

<filter-mapping>
  <filter-name>EncodingFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

回答by dmatej

We already did something similar to Roland's solution on SGES2.1.1 (I thing it uses catalina same as some old Tomcats), but it had some problems:

我们已经在 SGES2.1.1 上做了一些类似于 Roland 的解决方案(我认为它使用 catalina 与一些旧的 Tomcats 相同),但它有一些问题:

  1. it duplicates what the application server does
  2. it must take care also to internal JSP requests, included pages with parameters ...
  3. it must parse query string
  4. it must do it all again everytime is setRequest called, but later, because of 2.
  5. it is too heavy workaround
  1. 它复制了应用服务器所做的事情
  2. 它还必须注意内部 JSP 请求,包括带有参数的页面......
  3. 它必须解析查询字符串
  4. 每次调用 setRequest 时,它都必须再次执行所有操作,但稍后,由于 2。
  5. 解决方法太繁重

Today, after I read many blogs and advices I deleted the whole class and did only one simple thing: parsed charset from the Content-Type header in the wrapper's constructor and set it to the wrapped instance.

今天,在我阅读了许多博客和建议后,我删除了整个类,只做了一件简单的事情:从包装器构造函数中的 Content-Type 标头解析字符集,并将其设置为包装的实例。

It works, all our 988 tests succeeded.

它有效,我们所有的 988 测试都成功了。

private static final Pattern CHARSET_PATTERN 
    = Pattern.compile("(?i)\bcharset=\s*\"?([^\s;\"]*)");
private static final String CHARSET_DEFAULT = "ISO-8859-2";

public CisHttpRequestWrapper(final HttpServletRequest request) {
  super(request);
  if (request.getCharacterEncoding() != null) {
    return;
  }
  final String charset = parseCharset(request);
  try {
    setCharacterEncoding(charset);
  } catch (final UnsupportedEncodingException e) {
    throw new IllegalStateException("Unknown charset: " + charset, e);
  }
}

private String parseCharset(final HttpServletRequest request) {
  final String contentType = request.getHeader("Content-Type");
  if (contentType == null || contentType.isEmpty()) {
    return CHARSET_DEFAULT;
  }
  final Matcher m = CHARSET_PATTERN.matcher(contentType);
  if (!m.find()) {
    return CHARSET_DEFAULT;
  }
  final String charsetName = m.group(1).trim().toUpperCase();
  return charsetName;
}