java 特殊字符和重音字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3096618/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Special and accented characters
提问by klonq
I am doing some work for a French client and so need to deal with accented characters. But I'm running into a lot of difficulty, I am hoping the solution is simple and that somebody can point it out to me.
我正在为法国客户做一些工作,因此需要处理重音字符。但是我遇到了很多困难,我希望解决方案很简单,并且有人可以向我指出。
The string: La Forêt pour Témoinis converted to: La For? pour T?oin
字符串:La Forêt pour Témoin转换为:La For? pour T?oin
Note the missing character following the accented character - the tfollowing the êand the mfollowing the é.
请注意重音字符后面缺少的字符 - ê后面的t和é后面的m。
I have tried using StringEscapeUtils which was successful at escaping some characters, such as ?. I have also built my own escape function which produces the same results (?will work, êwill not).
我曾尝试使用 StringEscapeUtils 成功转义某些字符,例如 ? . 我还构建了自己的转义函数,它产生相同的结果(?会起作用,ê不会)。
private String escapeChars(String string) {
char[] chars = string.toCharArray();
String result = "";
for (int i = 0; i < chars.length; i++) {
int c = chars[i];
result += "&#" + c + ";";
}
return result;
}
The project is running in eclipse using the App Engine plugin, I cannot narrow down whether the problem is caused by Java, App Engine, or SQLite.
该项目是使用 App Engine 插件在 eclipse 中运行的,我无法确定问题是由 Java、App Engine 还是 SQLite 引起的。
Any help is appreciated.
任何帮助表示赞赏。
EDIT:I have found that string are malformed when simply displaying the the request parameter from a form. (ie, request.getParameter("string") already has malformed content).
编辑:我发现在简单地显示表单中的请求参数时,字符串格式错误。(即 request.getParameter("string") 已经有格式错误的内容)。
I have tried the meta-tag suggested by Daniel with no success. I think you are on the right track though, the header data of html document follows:
我已经尝试了 Daniel 建议的元标记,但没有成功。我认为你是在正确的轨道上,html文档的标题数据如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
When accented characters are hard-coded into a JSP they are displayed as intended.
当重音字符被硬编码到 JSP 中时,它们会按预期显示。
EDIT:I have also added <?xml version="1.0" encoding="UTF-8"?>to the very start of the page.
编辑:我还添加<?xml version="1.0" encoding="UTF-8"?>到页面的最开始。
I am very close to a solution. I have found that if I change the encoding of the page from within the browser form data is passed to the server properly. I cannot figure out how to make the browser auto detect page encoding.
我非常接近解决方案。我发现如果我从浏览器表单中更改页面的编码,数据会正确传递到服务器。我不知道如何让浏览器自动检测页面编码。
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
RESOLVED:I couldn't work out how to make the browser auto-detect UTF-8 encoding which java defaults to. So I have forced character encoding to ISO-8859-1 using request.setCharacterEncoding("ISO-8859-1").
已解决:我不知道如何让浏览器自动检测 java 默认的 UTF-8 编码。所以我使用 request.setCharacterEncoding("ISO-8859-1") 将字符编码强制为 ISO-8859-1。
回答by BalusC
EDIT: I have found that string are malformed when simply displaying the the request parameter from a form. (ie, request.getParameter("string") already has malformed content).
编辑:我发现在简单地显示表单中的请求参数时,字符串格式错误。(即 request.getParameter("string") 已经有格式错误的内容)。
This can have three causes:
这可能有三个原因:
It's a GET request and the server isn't configured to use UTF-8 to parse request URI. It's unclear which server you're using, so here's a Tomcat-targeted answer as example: set
URIEncodingattribute of the HTTP Connectorin/conf/server.xmltoUTF-8.If it's a POST request, then you need to ensure that the servletcontainer uses UTF-8 to encode the request body. You can do that by
request.setCharacterEncoding("UTF-8")beforehand.The console which you're writing the parameter to doesn't support UTF-8. It's unclear which console you're talking about, so here's an Eclipse-targeted answer as example: in Window > Preferences > General > Workspace > Text File Encodingset it to UTF-8.
这是一个 GET 请求,服务器未配置为使用 UTF-8 来解析请求 URI。目前还不清楚你正在使用的服务器,所以这里的一个Tomcat针对性的答案例如:集
URIEncoding的属性HTTP连接器中/conf/server.xml来UTF-8。如果是 POST 请求,则需要确保 servletcontainer 使用 UTF-8 对请求正文进行编码。你可以
request.setCharacterEncoding("UTF-8")事先这样做。您写入参数的控制台不支持 UTF-8。目前尚不清楚您在谈论哪个控制台,因此这里有一个针对 Eclipse 的答案作为示例:在窗口 > 首选项 > 常规 > 工作区 > 文本文件编码中将其设置为 UTF-8。
See also:
另见:
回答by Jon Skeet
Okay, so the first problem is you need to find out where the data is being lost.
好的,所以第一个问题是您需要找出数据丢失的位置。
- Add appropriate logging of the unicode characters (ideally in hex) so you can see whether you can write to SQLite and retrieve the data correctly.
- Hard-code some data so you can see whether it's coming back correctly
- Make sure that anywhereyou have a text-to-binary conversion, you specify an appropriate encoding (e.g. UTF-8)
- 添加适当的 unicode 字符日志记录(最好是十六进制),以便您可以查看是否可以写入 SQLite 并正确检索数据。
- 硬编码一些数据,以便您可以查看它是否正确返回
- 确保在任何地方进行文本到二进制转换,指定适当的编码(例如 UTF-8)
You haven't really said wherethings are going wrong, but I'd expect that if you sort out the character encoding, the rest should fall into place. MaybeSQLite has problems, but I doubt it...
您还没有真正说出哪里出了问题,但我希望如果您理清字符编码,其余的应该就位。也许SQLite 有问题,但我怀疑...
回答by Daniel Trebbien
You need to make sure that the HTML that is sent back to the browser has a charset. You should both send back Content-Type: text/html; charset=UTF-8as an HTTP response header andinclude, as the first child element of the headtag:
您需要确保发送回浏览器的 HTML 具有字符集。您应该Content-Type: text/html; charset=UTF-8作为 HTTP 响应标头发回并包含作为head标记的第一个子元素:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Or, if you are using XHTML:
或者,如果您使用的是 XHTML:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Though just having the metatag will often fix the problem.
尽管只有meta标签通常可以解决问题。
Also, make sure that your HTML is valid by using the W3C Markup Validation Service.
此外,请使用W3C 标记验证服务确保您的 HTML 有效。
See also: FAQ: Weird characters and question marks appear instead of accented characters
回答by ktingle
Is it possible the string is in tact, but you are attempting to print these characters with a en-us localization?
字符串是否可能完好无损,但您正在尝试使用 en-us 本地化打印这些字符?

