我应该从 UTF-8 更改为 UTF-16 以适应 HTML 中的中文字符吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3864842/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should I change from UTF-8 to UTF-16 to accommodate Chinese characters in my HTML?
提问by Aaron Salazar
I am using ASP.NET MVC, MS SQL and IIS. I have a few users that have used Chinese characters in their profile info. However, when I display this information is shows up as æ?å¼·è¯
but they are correct in my database. Currently my UTF for my HTML pages is set to UTF-8. Should I change it to UTF-16? I understand there are a few problemsthat can come from this but what are my choices?
我正在使用 ASP.NET MVC、MS SQL 和 IIS。我有几个用户在他们的个人资料信息中使用了汉字。但是,当我显示此信息时,显示为æ?å¼·è¯
但它们在我的数据库中是正确的。目前,我的 HTML 页面的 UTF 设置为 UTF-8。我应该将其更改为 UTF-16 吗?我知道这可能会带来一些问题,但我的选择是什么?
Thank you,
谢谢,
Aaron
亚伦
回答by Yuji
UTF-8 and UTF-16 encode exactly the same set of characters. It's not that UTF-8 doesn't cover Chinese characters and UTF-16 does. UTF-16 uses uniformly 16 bits to represent a character; while UTF-8 uses 1, 2, 3, up to a max of 4 bytes, depending on the character, so that an ASCII character is represented still as 1 byte. Start with this Wikipedia articleto get the idea behind it.
UTF-8 和 UTF-16 编码完全相同的字符集。并不是说 UTF-8 不包括汉字而 UTF-16 可以。UTF-16 统一使用 16 位来表示一个字符;而 UTF-8 使用 1、2、3,最多 4 个字节,具体取决于字符,因此 ASCII 字符仍表示为 1 个字节。从这篇维基百科文章开始,了解其背后的想法。
So, there's little chance switching to UTF-16 will help you at all. There's a chance it makes things worse, as is discussed in the SO question you linked above. There's a problem somewhere else in your setup, which does not correctly take into account non-ASCII or non-Latin-1 characters. Make sure every part of your setup works in UTF-8.
因此,切换到 UTF-16 对您有帮助的可能性很小。正如您在上面链接的 SO 问题中所讨论的那样,它有可能使事情变得更糟。您的设置中的其他地方存在问题,它没有正确考虑非 ASCII 或非拉丁 1 字符。确保设置的每个部分都在 UTF-8 中工作。
回答by jjrv
Any UTF coding should work the same in their ability to represent Unicode characters so switching to UTF-16 wouldn't help. There's an encoding issue somewhere and with UTF-16 you would only end up with different wrong HTML representation. Of course if you have some library that simply encodes non-ASCII characters as entities and does support wide characters, your problem may be solved by the switch. There are however characters that need even 2 wide characters and these would still be shown wrong, although users might rarely notice. The best option would be to have whatever is producing the HTML to interpret your UTF-8 correctly.
任何 UTF 编码在表示 Unicode 字符的能力方面都应该相同,因此切换到 UTF-16 无济于事。某处存在编码问题,使用 UTF-16 只会得到不同的错误 HTML 表示。当然,如果您有一些库将非 ASCII 字符简单地编码为实体并且确实支持宽字符,那么您的问题可能会通过开关解决。然而,有些字符甚至需要 2 个宽字符,但这些字符仍然会显示错误,尽管用户可能很少注意到。最好的选择是让任何产生 HTML 的东西正确解释你的 UTF-8。