java jsp utf 编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/488448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:37:20  来源:igfitidea点击:

jsp utf encoding

javajspencodingutf

提问by nicolamontecchio

I'm having a hard time figuring out how to handle this problem:

我很难弄清楚如何处理这个问题:

I'm developing a web tool for an Italian university, and I have to display words with accents (such as è, ù, ...); sometimes I get these words from a PostgreSql table (UTF8-encoded), but mostly I have to read long passages from a file. These files are encoded as utf-8 xml, and display fine in Smultron or any utf-8 editor (they were created parsing in python old files with entities such as èinstead of "è").

我正在为一所意大利大学开发一个网络工具,我必须显示带有重音符号的单词(例如 è、ù、...);有时我从 PostgreSql 表(UTF8 编码)中得到这些词,但大多数情况下我必须从文件中读取长段落。这些文件被编码为 utf-8 xml,并在 Smultron 或任何 utf-8 编辑器中显示良好(它们是在 python 旧文件中创建的,使用实体è代替“è”)。

I wrote a java class which extracts the relevant segments from the xml file, which works like this:

我写了一个 java 类,它从 xml 文件中提取相关的段,它的工作原理是这样的:

String s = parseText(filename, position)

String s = parseText(filename, position)

if I write the returned String to a file, everything looks fine; the problem is that if I do

如果我将返回的字符串写入文件,一切看起来都很好;问题是,如果我这样做

out.write(s)

out.write(s)

in the jsp page, I get strange characters. By the way, I use

在jsp页面中,我得到了奇怪的字符。顺便说一下,我用

String s = getWordFromPostgresql(...)

String s = getWordFromPostgresql(...)

out.write(s)

out.write(s)

in the very same jsp and it displays OK.

在同一个jsp中,它显示OK。

Any hint?

任何提示?

Thanks Nicola

谢谢尼古拉



@krosenvold

@克罗森沃尔德

Thanks for your response, however that directive is already in the page, but it doesn't work (actually it "works" but only for the strings I get from the database). I think there's something about reading from the files, but I can't understand ... they work in "java" but not in "jsp" (can't think about a better explanation ...)

感谢您的回复,但是该指令已经在页面中,但它不起作用(实际上它“有效”,但仅适用于我从数据库中获取的字符串)。我认为有一些关于从文件中读取的内容,但我无法理解......它们在“java”中工作但不在“jsp”中工作(想不出更好的解释......)

here's a basic example extracted from the actual code: the method to read from the files return a Map, from a Mark (an object representing a position in the text) to a String (containing the text):

这是从实际代码中提取的一个基本示例:从文件中读取的方法返回一个 Map,从一个 Mark(一个表示文本中位置的对象)到一个 String(包含文本):

this is in the .jsp page (with the utf-directive cited in the posts above)

这是在 .jsp 页面中(上面的帖子中引用了 utf 指令)

    // ...
    Map<Mark, String> map = TestoMarkParser.parseMarks(...);
    out.write(map.get(m));

and this is the result:

这是结果:

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"

if I put the same code in a java class, and substitute out.write with System.out.println, the result is this:

如果我将相同的代码放在一个 java 类中,并用 System.out.println 替换 out.write,结果是这样的:

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"



I've been doing some analysis with an hex editor, here it is:

我一直在用十六进制编辑器进行一些分析,这里是:

original string: "fu però così "

原始字符串:“fu però così”

ò in xml file: C3 B2

ò 在 xml 文件中:C3 B2

ò as rendered by out.write() in the jsp file: E2 88 9A E2 89 A4

ò 由 jsp 文件中的 out.write() 呈现:E2 88 9A E2 89 A4

ò as written to file via:

ò 通过以下方式写入文件:

FileWriter w = new FileWriter(new File("out.txt"));
w.write(s);     // s is the parsed string
w.close();

C3 B2

C3 B2

printing the values of each character as an int

将每个字符的值打印为 int

0: 70 = F
1: 117 = u
2: 32 =  
3: 112 = p
4: 101 = e
5: 114 = r
6: 8730 = ? 
7: 8804 = ? 
8: 32 =  
9: 99 = c
10: 111 = o
11: 115 = s
12: 8730 = ?
13: 168 = ?
14: 10 = `

回答by krosenvold

In the jsp page directive you should try setting your content-type to utf-8, which will set the pageEncoding to utf-8 also.

在 jsp page 指令中,您应该尝试将 content-type 设置为 utf-8,这也会将 pageEncoding 设置为 utf-8。

<%@page contentType="text/html;charset=UTF-8"%>

UTF-8 is notdefault content type in jsp, and there are all sorts of interesting problems that arise from this. The problem is that the underlying stream is interpreted as an ISO-8859-1 stream by default. If you write some unicode bytes to this stream, they will be interpreted as ISO-8859-1. I find that setting the encoding to utf-8 is the best solution.

UTF-8不是jsp 中的默认内容类型,由此产生了各种有趣的问题。问题是底层流默认被解释为 ISO-8859-1 流。如果您将一些 unicode 字节写入此流,它们将被解释为 ISO-8859-1。我发现将编码设置为 utf-8 是最好的解决方案。

Edit: Furthermore, a stringvariable in java should alwaysbe unicode. So you should always be able to say

编辑:此外,java 中的字符串变量应该始终是 unicode。所以你应该总是能够说

System.out.println(myString) 

and see the proper character set coming in the console window of your web-server (or just stop in the debugger and examine it). I suspect that you'll be seeing incorrect characters when you do this, which leads me to believe you have an encoding problem when constructing the string.

并在您的网络服务器的控制台窗口中看到正确的字符集(或者只是在调试器中停止并检查它)。我怀疑当你这样做时你会看到不正确的字符,这让我相信你在构造字符串时有编码问题。

回答by cellepo

I have some international jsp's [which have "special" international (with respect to English) characters].

我有一些国际 jsp [具有“特殊”国际(相对于英语)字符]。

Inserting this [and only this, i.e: no contentType directive also (that made a duplicate contentType error)] at the top of them got them to save and render correctly:

在它们的顶部插入这个 [并且只有这个,即:也没有 contentType 指令(导致重复的 contentType 错误)] 使它们能够正确保存和呈现:

<%@page pageEncoding="UTF-8"%>

This reference [http://www.inter-locale.com/codeset1.jsp] helped me discover that.

这个参考 [http://www.inter-locale.com/codeset1.jsp] 帮助我发现了这一点。

回答by mismanc

I had also the same problem, everything is "utf-8" and why i see
senseless characters and the problem was in jsp and it must be at the head of the page.

我也有同样的问题,一切都是“utf-8”,为什么我看到
无意义的字符,问题出在 jsp 中,它必须在页面的开头。

 <%request.setCharacterEncoding("utf-8");%>

and everything will be ok.

一切都会好起来的。

回答by kdgregory

String s = parseText(filename, position)

Where is this method defined? I'm guessing that it's your own method, which opens the file and extracts a particular chunk of the data. Somewhere in this process it's getting converted from bytes to characters, probably using the default encoding for your JVM.

这个方法是在哪里定义的?我猜这是你自己的方法,它打开文件并提取特定的数据块。在这个过程的某个地方,它正在从字节转换为字符,可能使用 JVM 的默认编码。

If the default encoding of your running JVM doesn't match the actual encoding in the file, you're going to get incorrect characters in your string. Added to that, if you're reading content that is encoded in a multi-byte form (such as UTF-8), your "position" may point into the middle of a multi-byte encoding.

如果您正在运行的 JVM 的默认编码与文件中的实际编码不匹配,您将在字符串中得到不正确的字符。此外,如果您正在阅读以多字节形式(例如 UTF-8)编码的内容,您的“位置”可能会指向多字节编码的中间。

If the source files are in well-formed XML, you'll be much better off using a real parser (such as the one built into the JDK) to parse them, since the parser will provide the correct translation of bytes to characters. Then use an XPath expression to retrieve the values.

如果源文件是格式良好的 XML,那么最好使用真正的解析器(例如内置于 JDK 中的解析器)来解析它们,因为解析器将提供字节到字符的正确转换。然后使用 XPath 表达式来检索值。

If you haven't used an XML parser in the past, here are two documents that I wrote on parsingand XPath.

如果您以前没有使用过 XML 解析器,这里是我写的关于解析XPath 的两个文档。



Edit: one thing that you may find helpful is to print out the actual character values in the string, using something like the following:

编辑:您可能会发现有用的一件事是使用以下内容打印出字符串中的实际字符值:

public static void main(String[] argv) throws Exception
{
    String s = "testing\u20ac";
    for (int ii = 0 ; ii < s.length() ; ii++)
    {
        System.out.println(ii + ": " + (int)s.charAt(ii) + " = " + s.charAt(ii));
    }
}

You should probably also print out your default character set, so that you know how any particular sequence of bytes is translated to characters:

您可能还应该打印出您的默认字符集,以便您知道如何将任何特定的字节序列转换为字符:

public static void main(String[] argv) throws Exception
{
    System.out.println(Charset.defaultCharset());
}

And finally, you should examine the served page as raw bytes, to see exactly what's being returned to the client.

最后,您应该将提供的页面作为原始字节进行检查,以准确查看返回给客户端的内容。



Edit #2: the character ò is Unicode value 00F2, which would be UTF-8 encoded as C3 B2. These two codes doesn't correspond to the symbols that you showed in your earlier answer.

编辑#2:字符ò 是Unicode 值00F2,它是UTF-8 编码为C3 B2。这两个代码与您在之前的答案中显示的符号不对应。

For more on Unicode characters, see the code chartsat Unicode.org.

有关 Unicode 字符的更多信息,请参阅Unicode.org 上的代码图表