Javascript 字符集问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6692302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript Charset problem
提问by Cata
I want to read a file from my server with javascript and display it's content in a html page. The file is in ANSI charset, and it has romanian characters.. I want to display those characters in the way they are :D not in different black symbols..
我想使用 javascript 从我的服务器读取文件并将其内容显示在 html 页面中。该文件是 ANSI 字符集,它有罗马尼亚字符.. 我想以它们的方式显示这些字符 :D 而不是不同的黑色符号..
So I think my problem is the charset.. I have a get request that takes the content of the file, like this:
所以我认为我的问题是字符集..我有一个获取文件内容的请求,如下所示:
function IO(U, V) {//LA MOD String Version. A tiny ajax library. by, DanDavis
var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
X.open(V ? 'PUT' : 'GET', U, false );
X.setRequestHeader('Content-Type', 'Charset=UTF-8');
X.send(V ? V : '');return X.responseText;}
As far as I know the romanian characters are included in UTF-8 charset so I set the charset of the request header to utf-8.. the file is in utf-8 format and I have the meta tag that tells the browser that the page has utf-8 content..
据我所知,罗马尼亚语字符包含在 UTF-8 字符集中,因此我将请求标头的字符集设置为 utf-8 .. 该文件采用 utf-8 格式,并且我有一个元标记,它告诉浏览器页面有 utf-8 内容..
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So if I query the server the direct file, the browser shows me the romanian characters but if I display the content of the page through this script, I see only symbols instead of characters.. So what I am doing wrong?
因此,如果我查询服务器的直接文件,浏览器会向我显示罗马尼亚语字符,但是如果我通过此脚本显示页面内容,我只会看到符号而不是字符。那么我做错了什么?
Thank you!
谢谢!
PS: I want this to work on Firefox at least not necessarily in all browsers..
PS:我希望它至少可以在 Firefox 上运行,但不一定在所有浏览器中都可以使用。
回答by Tomalak
While my initial assumption was the same as T.J. Crowder's, a quick chatestablished that the OP uses some hosting service and cannot easily change the Content-Type headers.
虽然我最初的假设与 TJ Crowder 的假设相同,但快速聊天确定 OP 使用某些托管服务并且无法轻松更改 Content-Type 标头。
The files were sent as text/plain
or text/html
without any Charset
paramter, hence the browser interprets them as UTF-8 (which is the default).
这些文件作为text/plain
或text/html
没有任何Charset
参数发送,因此浏览器将它们解释为 UTF-8(这是默认值)。
So savingthe files in UTF-8 (instead of ANSI/Windows-1252) did the trick.
所以以 UTF-8(而不是 ANSI/Windows-1252)保存文件就行了。
回答by T.J. Crowder
You need to ensure that the HTTP response returning the file data has the correct charset identified on it. You have to do that server-side, I don't think you can force it from the client. (When you set the content type in the request header, you're setting the content type of the request, not the response.) So for instance, the responseheader from the server would be along the lines of:
您需要确保返回文件数据的 HTTP 响应上标识了正确的字符集。您必须在服务器端执行该操作,我认为您不能从客户端强制执行该操作。(当您设置在请求头中的内容类型,你设置的内容类型的请求,而不是响应)。因此,例如,在响应从服务器头是沿着线:
Content-Type: text/plain; charset=windows-1252
...if by "ANSI" you mean the Windows-1252charset. That should tell the browser what it needs to do to decode the response text correctly before handing it to the JavaScript layer.
...如果“ANSI”是指Windows-1252字符集。这应该告诉浏览器在将响应文本交给 JavaScript 层之前它需要做什么来正确解码响应文本。
One problem, though: As far as I can tell, Windows-1252 doesn't have the full Romanian alphabet. So if you're seeing characters like ?
, ?
, ?
, ?
, etc., that suggests the source text is not in Windows-1252. Now, perhaps it's okay to drop the diacriticals on those in Romanian (I wouldn't know) and so if your source text just uses S
and T
instead of ?
and ?
, etc., it could still be in Windows-1252. Or it may be ISO-8859 or ISO-8859-2 (both of which drop some diacriticals) or possibly ISO-8859-16 (which has full Romanian support). Details here.
但有一个问题:据我所知,Windows-1252 没有完整的罗马尼亚字母表。因此,如果您看到诸如?
、?
、?
、?
等字符,则表明源文本不在 Windows-1252 中。现在,也许可以放弃罗马尼亚语中的变音符号(我不知道),所以如果您的源文本只是使用S
andT
而不是?
and?
等,它仍然可以在 Windows-1252 中。或者它可能是 ISO-8859 或 ISO-8859-2(两者都删除了一些变音符号)或可能是 ISO-8859-16(完全支持罗马尼亚语)。详情在这里。
So the first thing to do is determine what character set the source text is actually in.
所以首先要做的是确定源文本实际使用的字符集。