Linux 使用什么字符集将俄语文本作为数组存储到 javascript 文件中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3677093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What charset to use to store russian text into javascript files as an array
提问by crosenblum
I am creating a coldfusion page, that takes language translation data stored in a table in my database, and makes static js files for each language pairing of english to ___ etc...
我正在创建一个coldfusion页面,它将语言翻译数据存储在我的数据库中的一个表中,并为英语到___等的每种语言配对制作静态js文件...
I am now starting to work on russian, I was able to get the other languages to work fine..
我现在开始使用俄语,我能够让其他语言正常工作..
However, when it saves the file, all the text looks like question marks. Even when I run my translation app, the text for just that language looks like all ?????
但是,当它保存文件时,所有文本看起来都像问号。即使当我运行我的翻译应用程序时,该语言的文本看起来像所有 ?????
I have tried writing it via cffile as utf-8 or ISO-8859-1 but neither seems to get it to display properly.
我曾尝试通过 cffile 将其编写为 utf-8 或 ISO-8859-1,但似乎都无法正确显示。
Any suggestions?
有什么建议?
回答by Robin
Have you tried ISO-8859-5? I believe it's the encoding that "should" be used for Russian.
您是否尝试过 ISO-8859-5?我相信这是“应该”用于俄语的编码。
回答by ryber
I can't personally reproduce this problem at all. Is the ColdFusion template that is making the call itself UTF-8? (with or without a BOM it matters not for Russian). In any case UTF-8 is absolutely what you should be using. Make sure you get a UTF-8 compliant editor. Which is most things on Mac. On Windows you could use Scite or GVim.
我根本无法亲自重现这个问题。调用本身的 ColdFusion 模板是 UTF-8 吗?(有或没有 BOM 对俄语无关紧要)。无论如何,UTF-8 绝对是您应该使用的。确保你得到一个 UTF-8 兼容的编辑器。这是 Mac 上最多的东西。在 Windows 上,您可以使用 Scite 或 GVim。
回答by Vincent Buck
By all means do use UTF-8 over any other encoding type. You need to make sure that:
务必在任何其他编码类型上使用 UTF-8。您需要确保:
- your cfm templates were written to disk with UTF-8 encoding (notepad++ handles that nicely, and so does Eclipse or the new ColdFusion Builder)
- your database was created with the proper codepage for nvarchar (and varchar) datatypes
- your database connection handles UTF-8
- 您的 cfm 模板是用 UTF-8 编码写入磁盘的(notepad++ 处理得很好,Eclipse 或新的 ColdFusion Builder 也是如此)
- 您的数据库是使用 nvarchar(和 varchar)数据类型的正确代码页创建的
- 您的数据库连接处理 UTF-8
How to go about the last two items depends on your database back-end. Coldfusion is quite agnostic in that regard, as it will happily use any jdbc driver that you may need.
如何处理最后两项取决于您的数据库后端。Coldfusion 在这方面完全不可知,因为它很乐意使用您可能需要的任何 jdbc 驱动程序。
When working in a multi-character set environment, character set conversion issues can occur and it can be difficult to determine where the conversion issue occurred.
在多字符集环境中工作时,可能会出现字符集转换问题,并且很难确定发生转换问题的位置。
There are two categories into which conversion issues can be placed. The first involves sending data in the wrong format to the client API. Although this cannot happen with Unicode APIs, it is possible with all other client APIs and results in garbage data.
可以将转换问题分为两类。第一个涉及以错误格式向客户端 API 发送数据。尽管 Unicode API 不会发生这种情况,但所有其他客户端 API 都可能发生这种情况并导致垃圾数据。
The second category of issue involves a character that does not have an equivalent in the final character set, or in one of the intermediate character sets. In this case, a substitution character is used. This is called lossy conversion and can happen with any client API. You can avoid lossy conversions by configuring the database to use UTF-8 for the database character set.
第二类问题涉及在最终字符集中或中间字符集中没有等效字符的字符。在这种情况下,使用替换字符。这称为有损转换,任何客户端 API 都可能发生。您可以通过将数据库配置为对数据库字符集使用 UTF-8 来避免有损转换。
The advantage of UTF-8 over any other encoding is that you can handle any number of languages in the same database / client.
UTF-8 相对于任何其他编码的优势在于您可以在同一数据库/客户端中处理任意数量的语言。
回答by bobince
The correct encoding to use in a .js
file is whatever encoding the parent page is in. Whilst there are methods to serve JavaScript using a different encoding to the page including it, they don't work on all browsers.
在.js
文件中使用的正确编码是父页面采用的任何编码。虽然有一些方法可以使用不同的编码为页面提供 JavaScript,包括它,但它们并不适用于所有浏览器。
So make sure your web page is being saved and served in an encoding that contains the Russian characters, and then save the .js file using the same encoding. That will be either:
因此,请确保您的网页以包含俄语字符的编码保存和提供,然后使用相同的编码保存 .js 文件。那将是:
ISO-8859-5. A single-byte encoding with Cyrillic in the high bytes, similar to Windows code page 1251. cp1251 will be the default encoding when you save in a text editor from a Russian install of Windows;
or UTF-8. A multi-byte encoding that contains every character. All modern websites should be using UTF-8.
ISO-8859-5。单字节编码,高字节为西里尔文,类似于 Windows 代码页 1251。当您从俄语安装的 Windows 保存在文本编辑器中时,cp1251 将是默认编码;
或 UTF-8。包含每个字符的多字节编码。所有现代网站都应该使用 UTF-8。
(ISO-8859-1 is Western European and does not include any Cyrillic. It is similar to code page 1252, the default on a Western Windows install. It's of no use to you.)
(ISO-8859-1 是西欧的,不包含任何西里尔字母。它类似于代码页 1252,这是西方 Windows 安装的默认值。它对您没有用。)
So, best is to save both the cf template and the js file as UTF-8, and add <cfprocessingdirective pageencoding="utf-8">
if CF doesn't pick it up automatically.
因此,最好将 cf 模板和 js 文件都保存为 UTF-8,<cfprocessingdirective pageencoding="utf-8">
如果 CF 没有自动提取,则添加。
If you can't control the encoding of the page that includes the script (for example because it's a third party), then you can't use any non-ASCII characters directly. You would have to use JavaScript string literal escapes instead:
如果您无法控制包含脚本的页面的编码(例如因为它是第三方),则您不能直接使用任何非 ASCII 字符。您将不得不使用 JavaScript 字符串文字转义:
var translation_ru= {
launchMyCalendar: '\u0417\u0430\u043f\u0443\u0441\u043a \u041c\u043e\u0439 \u043a\u0430\u043b\u0435\u043d\u0434\u0430\u0440\u044c'
};
when it saves to file it is "·D??áú ?Tù úD??Y?Dàì" so the charset is wrong
当它保存到文件时,它是“·D??áú?Tù úD??Y?Dàì”所以字符集是错误的
Looks like you've saved as cp1251 (ie. default codepage on a Russian machine) and then copied the file to a Western server where the default codepage is cp1252.
看起来您已保存为 cp1251(即俄罗斯机器上的默认代码页),然后将该文件复制到默认代码页为 cp1252 的西方服务器。
I also just found out that my text editor of choice, textpad, doesn't support unicode.
我也刚刚发现我选择的文本编辑器 textpad 不支持 unicode。
Yes, that was my reason for no longer using it too. EmEditor (commercial) and Notepad++ (open-source) are good replacements.
是的,这也是我不再使用它的原因。EmEditor(商业)和 Notepad++(开源)是很好的替代品。