Javascript 解码包含特殊 HTML 实体的字符串的正确方法是什么？

Question

提问by Dan Tao

Say I get some JSON back from a service request that looks like this:

假设我从如下所示的服务请求中获取了一些 JSON：

{
    "message": "We&#39;re unable to complete your request at this time."
}

I'm not sure whythat apostraphe is encoded like that ('); all I know is that I want to decode it.

我不知道为什么那个撇号是这样编码的（'）；我只知道我想解码它。

Here's one approach using jQuery that popped into my head:

这是一种使用 jQuery 的方法，它突然出现在我的脑海中：

function decodeHtml(html) {
    return $('<div>').html(html).text();
}

That seems (very) hacky, though. What's a better way? Is there a "right" way?

不过，这似乎（非常）hacky。什么是更好的方法？有“正确”的方法吗？

Answer 1

回答by Rob W

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

这是我最喜欢的解码 HTML 字符的方式。使用此代码的优点是还保留了标签。

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Example: http://jsfiddle.net/k65s3/

示例：http: //jsfiddle.net/k65s3/

Input:

输入：

Entity:&nbsp;Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

输出：

Entity:?Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Answer 2

回答by Mathias Bynens

Don't use the DOM to do this.Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results.

不要使用 DOM 来执行此操作。使用 DOM 解码 HTML 实体（如当前接受的答案中所建议的）会导致跨浏览器结果的差异。

For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the helibrary. From its README:

对于根据HTML标准的算法解码字符引用一个强大的和确定性的解决方案，使用了他的库。从它的自述文件：

he(for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersandsand other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — hehandles astral Unicode symbols just fine. An online demo is available.

he（代表“HTML 实体”）是一个用 JavaScript 编写的强大的 HTML 实体编码器/解码器。它支持所有标准化的 HTML 命名字符引用，像浏览器一样处理不明确的＆符号和其他边缘情况，具有广泛的测试套件，并且——与许多其他 JavaScript 解决方案相反——他可以很好地处理星体 Unicode 符号。提供在线演示。

Here's how you'd use it:

以下是您如何使用它：

he.decode("We&#39;re unable to complete your request at this time.");
→ "We're unable to complete your request at this time."

Disclaimer: I'm the author of the helibrary.

免责声明：我是he库的作者。

See this Stack Overflow answerfor some more info.

有关更多信息，请参阅此堆栈溢出答案。

Answer 3

回答by Alxandr

If you don't want to use html/dom, you could use regex. I haven't tested this; but something along the lines of:

如果你不想使用 html/dom，你可以使用正则表达式。我没有测试过这个；但大致如下：

function parseHtmlEntities(str) {
    return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
        var num = parseInt(numStr, 10); // read num as normal number
        return String.fromCharCode(num);
    });
}

[Edit]

[编辑]

Note: this would only work for numeric html-entities, and not stuff like &oring;.

注意：这仅适用于数字 html 实体，而不适用于 &oring; 之类的东西。

[Edit 2]

[编辑 2]

Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/

修复了功能（一些错别字），在这里测试：http: //jsfiddle.net/Be2Bd/1/

Answer 4

回答by Jason Williams

jQuery will encode and decode for you.

jQuery 将为您编码和解码。

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
   $("#encoded")
  .text(htmlEncode("<img src onerror='alert(0)'>"));
   $("#decoded")
  .text(htmlDecode("&lt;img src onerror='alert(0)'&gt;"));
});
</script>

<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>

Answer 5

回答by hypers

There's JS function to deal with &#xxxxstyled entities:
function at GitHub

有处理&#xxxx样式实体的 JS 函数：
GitHub 上的函数

// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
  return str.replace(/&#(\d+);/g, function(match, dec) {
    return String.fromCharCode(dec);
  });
};

var encodeHtmlEntity = function(str) {
  var buf = [];
  for (var i=str.length-1;i>=0;i--) {
    buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
  }
  return buf.join('');
};

var entity = '&#39640;&#32423;&#31243;&#24207;&#35774;&#35745;';
var str = '高级程序设计';
console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true

Answer 6

回答by tldr

_.unescapedoes what you're looking for

_.unescape做你正在寻找的

https://lodash.com/docs/#unescape

Answer 7

回答by kodmanyagha

This is so good answer. You can use this with angular like this:

这是一个很好的答案。您可以像这样使用 angular：

 moduleDefinitions.filter('sanitize', ['$sce', function($sce) {
    return function(htmlCode) {
        var txt = document.createElement("textarea");
        txt.innerHTML = htmlCode;
        return $sce.trustAsHtml(txt.value);
    }
}]);

Javascript 解码包含特殊 HTML 实体的字符串的正确方法是什么？

提问by Dan Tao

回答by Rob W

回答by Mathias Bynens

回答by Alxandr

[Edit]

[编辑]

[Edit 2]

[编辑 2]

回答by Jason Williams

回答by hypers

回答by tldr

回答by kodmanyagha

相关推荐

最近更新

标签

Javascript 解码包含特殊 HTML 实体的字符串的正确方法是什么？

提问by Dan Tao

回答by Rob W

回答by Mathias Bynens

回答by Alxandr

[Edit]

[编辑]

[Edit 2]

[编辑 2]

回答by Jason Williams

回答by hypers

回答by tldr

回答by kodmanyagha

相关推荐

Javascript MongoDB 有原生的 REST 接口吗？

Javascript ExtJS 4 设置视口的加载掩码

Javascript 复选框更改事件未触发

使用 Javascript 将 iframe 插入 Div - 用于 Greasemonkey

相关推荐

最近更新

标签