日语文本的 HTML 编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12648655/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 03:05:32  来源:igfitidea点击:

HTML encoding of Japanese text

htmlcharacter-encoding

提问by usr-local-ΕΨΗΕΛΩΝ

I'm making a static HTML page that displays courtesy text in multiple languages. I noticed that if I paste ウェブサイトのメンテナンスの下でinto Expression Blend, that text appears the same in the code. I think it's bad for compatibility and should be replaced by proper HTML entities.

我正在制作一个静态 HTML 页面,以多种语言显示礼貌文本。我注意到,如果我粘贴ウェブサイトのメンテナンスの下で到 Expression Blend 中,该文本在代码中的显示效果是一样的。我认为这对兼容性不利,应该由适当的 HTML 实体替换。

I have tried http://www.opinionatedgeek.com/DotNet/Tools/HTMLEncode/encode.aspxbut it returns me the same Japanese text.

我试过http://www.opinionatedgeek.com/DotNet/Tools/HTMLEncode/encode.aspx但它返回给我相同的日语文本。

  1. Is it correct, from the point of view of browser compatibility, to paste that Japanese right into the source code of an HTML page?
  2. Else, what is the correct HTML encoding of that text? Or, better, is there any tool that I can use to convert non-ASCII characters to HTML entities, possibly online and possibly free?
  1. 从浏览器兼容性的角度来看,将日语直接粘贴到 HTML 页面的源代码中是否正确?
  2. 否则,该文本的正确 HTML 编码是什么?或者,更好的是,是否有任何工具可用于将非 ASCII 字符转换为 HTML 实体,可能是在线的,也可能是免费的?

采纳答案by o.v.

I think it's bad for compatibility and should be replaced by proper HTML entities.

我认为这对兼容性不利,应该由适当的 HTML 实体替换。

Quite the opposite actually, your preference should be to not use html entities but rather correctly declare document encoding as UTF-8 and use the actual characters. There are quite a few compelling reasons to do so, but the real question is why notuse it since it's a well- and widely supported standard?

实际上恰恰相反,您的偏好应该是不使用 html 实体,而是正确地将文档编码声明为 UTF-8 并使用实际字符。这样做有很多令人信服的理由,但真正的问题是为什么使用它,因为它是一个得到广泛支持的标准?

Some of those points have been summarised previously:

其中一些要点之前已经总结过

UTF-8 encodings are easier to read and edit for those who understand what the character means and know how to type it.

UTF-8 encodings are just as unintelligible as HTML entity encodings for those who don't understand them, but they have the advantage of rendering as special characters rather than hard to understand decimal or hex encodings.

[For example] Wikipedia... actually go through articles and convert character entities to their corresponding real characters for the sake of user-friendliness and searchability.

UTF-8 编码对于了解字符含义并知道如何键入的人来说更易于阅读和编辑。

对于那些不理解它们的人来说,UTF-8 编码与 HTML 实体编码一样难以理解,但它们具有呈现为特殊字符而不是难以理解的十进制或十六进制编码的优势。

[例如]维基百科...实际上是通过文章并将字符实体转换为相应的真实字符,以方便用户和可搜索。

回答by gogsrox

As long as you mark your web-page as UTF-8, either in the http headers or the meta tags, having foreign characters in your web-pages should be a non-issue. Alternately you could encode/decode these strings using encodeURI/decodeURI functions in JavaScript

只要您将网页标记为 UTF-8,无论是在 http 标头还是元标记中,网页中的外来字符都应该不是问题。或者,您可以使用 JavaScript 中的 encodeURI/decodeURI 函数对这些字符串进行编码/解码

encodeURI('ウェブサイトのメンテナンスの下で')
//returns"%E3%82%A6%E3%82%A7%E3%83%96%E3%82%B5%E3%82%A4%E3%83%88%E3%81%AE%E3%83%A1%E3%83%B3%E3%83%86%E3%83%8A%E3%83%B3%E3%82%B9%E3%81%AE%E4%B8%8B%E3%81%A7"

decodeURI("%E3%82%A6%E3%82%A7%E3%83%96%E3%82%B5%E3%82%A4%E3%83%88%E3%81%AE%E3%83%A1%E3%83%B3%E3%83%86%E3%83%8A%E3%83%B3%E3%82%B9%E3%81%AE%E4%B8%8B%E3%81%A7")
//returns ウェブサイトのメンテナンスの下で

If you are looking for a tool to convert a bunch of static strings to unicode characters, you could simply use encodeURI/decodeURI functions from a web-page developer console (firebug for mozilla/firefox). Hope this helps!

如果您正在寻找将一堆静态字符串转换为 unicode 字符的工具,您可以简单地使用来自网页开发人员控制台的 encodeURI/decodeURI 函数(mozilla/firefox 的 firebug)。希望这可以帮助!

回答by deceze

HTML entities are only useful if you need to represent a character that cannot be represented in the encoding your document is saved in. For example, ASCII has no specification for how to represent "". If you want to use that character in an ASCII encoded HTML document, you have to encode it as €or not use it at all.

HTML 实体仅在您需要表示无法以保存文档的编码表示的字符时才有用。例如,ASCII 没有关于如何表示“”的规范。如果要在 ASCII 编码的 HTML 文档中使用该字符,则必须将其编码为€或根本不使用它。

If you are using a character encoding for your document that can represent all the characters you need though, like UTF-8, there's no need for HTML entities. You simply need to make sure the browser knows what encoding the document is in so it can interpret it correctly. This is really the preferable method, since it simply keeps the source code readable. It really makes no sense to want to work with HTML entities if you can simply work with the actual characters.

如果您为文档使用的字符编码可以表示您需要的所有字符,例如 UTF-8,则不需要 HTML 实体。您只需要确保浏览器知道文档的编码方式,以便它可以正确解释它。这确实是更可取的方法,因为它只是保持源代码的可读性。如果您可以简单地使用实际字符,那么想要使用 HTML 实体确实没有意义。

See http://kunststube.net/frontbackfor some more information.

有关更多信息,请参阅http://kunststube.net/frontback