如何使用 JavaScript 在 HTML 标题中正确插入 unicode?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12114477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 07:08:09  来源:igfitidea点击:

How do I correctly insert unicode in an HTML title using JavaScript?

javascripthtmlunicode

提问by BenG

I'm seeing some weird behavior when I'm setting the title of an HTML page using JavaScript. If I insert html character references directly into the title the Unicode renders correctly, for instance:

当我使用 JavaScript 设置 HTML 页面的标题时,我看到了一些奇怪的行为。如果我将 html 字符引用直接插入到标题中,Unicode 将正确呈现,例如:

<title>&#21543;&#20986;</title>

But if I attempt to use html characters references via JavaScript, something seems to be converting the & to (& amp ;) (separating them so SO doesn't just turn it back into ampersand) and thus breaking the encoding, causing it to be rendered as the full coded string:

但是,如果我尝试通过 JavaScript 使用 html 字符引用,似乎有些东西正在将 & 转换为 (& amp ;)(将它们分开,这样就不会将其转回&符号),从而破坏编码,导致它成为呈现为完整的编码字符串:

function execTitleChange() {
  document.title = "&#21543;&#20986;";
}

(I should note that this is a little bit of speculation; when I introspect the DOM using Firebug after executing this JavaScript function, that's where I see the & instead of &.)

(我应该注意,这只是一点推测;当我在执行此 JavaScript 函数后使用 Firebug 内省 DOM 时,我看到的是 & 而不是 &。)

If I use \u encoded Unicode characters when setting the value from JavaScript then everything works correctly again:

如果我在从 JavaScript 设置值时使用 \u 编码的 Unicode 字符,那么一切都会再次正常工作:

function execTitleChange() {
  document.title = "\u5427\u51fa";
}

The fact that \u encoded characters work kind of makes sense to me since I think that's how JavaScript represents Unicode characters but I'm stumped as to why the behavior would be different when using the html character references.

\u 编码字符工作的事实对我来说很有意义,因为我认为这就是 JavaScript 表示 Unicode 字符的方式,但我很难理解为什么在使用 html 字符引用时行为会有所不同。

回答by Pointy

JavaScript string constants are parsed by the JavaScript parser. Text inside HTML tags is parsed by the HTML parser. The two languages (and, by extension, their parsers) are different, and in particular they have different ways of representing characters by character code.

JavaScript 字符串常量由 JavaScript 解析器解析。HTML 标签内的文本由 HTML 解析器解析。这两种语言(以及它们的解析器)是不同的,特别是它们通过字符代码表示字符的方式不同。

Thus, what you've discovered is the way reality actually is :-) Use the \uescape notation in JavaScript, and use HTML entities (&#nnnn;) in HTML/XML.

因此,您发现的是现实实际上是这样的 :-)\u在 JavaScript 中使用转义符号,并&#nnnn;在 HTML/XML 中使用 HTML 实体 ( )。

edit— now the situation can get even more confusing when you're talking about creating/inserting HTML fromJavaScript. When you use .innerHTMLto update the DOM from JavaScript, then you are basically handing over HTML source code to the HTML parser for interpretation. For that reason, you can use either JavaScript \uescapes or HTML entities, and things will work (excepting painful issues of character encoding mismatches etc).

编辑— 现在,当您谈论JavaScript创建/插入 HTML 时,情况会变得更加混乱。当您使用.innerHTMLJavaScript 更新 DOM 时,您基本上是将 HTML 源代码交给 HTML 解析器进行解释。出于这个原因,您可以使用 JavaScript\u转义符或 HTML 实体,一切都会奏效(除了字符编码不匹配等令人痛苦的问题)。

Finally, note that JavaScript also provides the String.fromCharCode()function to construct strings from numeric character codes.

最后,请注意 JavaScript 还提供了String.fromCharCode()从数字字符代码构造字符串的功能。

回答by Jukka K. Korpela

The best way to work with Unicode characters in JavaScript is to use the characters themselves, using an editor or other tool that can store them in UTF-8 encoding. You will avoid a lot of confusion. Naturally, you need to properly declare the character encoding of your .js or .html file.

在 JavaScript 中处理 Unicode 字符的最佳方法是使用字符本身,使用编辑器或其他可以将它们存储为 UTF-8 编码的工具。你会避免很多混乱。当然,您需要正确声明 .js 或 .html 文件的字符编码。

The construct &#21543;has no special meaning in JavaScript; it is just eight Ascii characters. But if your JavaScript code has been embedded into an HTML document, then it will be processed by HTML rules before passing to the JavaScript interpreter. And the rules vary by HTML version. Yet another reason to avoid such constructs.

该构造&#21543;在 JavaScript 中没有特殊意义;它只有八个 Ascii 字符。但是如果你的 JavaScript 代码已经嵌入到一个 HTML 文档中,那么它会在传递给 JavaScript 解释器之前由 HTML 规则处理。并且规则因 HTML 版本而异。避免这种结构的另一个原因。

So just write

所以只要写

document.title = "吧出";

(Of course, there are very few situations where you should change the titleelement content—which is crucial to search engines and many other purposes—in JavaScript, instead of setting it in HTML. But that's beside the point.)

(当然,在极少数情况下,您应该title在 JavaScript 中更改元素内容(这对搜索引擎和许多其他目的至关重要),而不是在 HTML 中进行设置。但这无关紧要。)