javascript 解析 XML 提要时用等效字符替换 HTML 实体(例如 ’)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17678694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 09:10:24  来源:igfitidea点击:

Replace HTML entities (e.g. ’) with character equivalents when parsing an XML feed

javascriptxml-parsingescapingtitaniumspecial-characters

提问by user2363025

When parsing an XML feed, I am getting text from the content tag, like this:

解析 XML 提要时,我从内容标签中获取文本,如下所示:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent accomodation. The latest grant will allow for major refurbishment to a section of the school to allow for new accommodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

政府已为圣尤南学院的一项重大翻新项目提供资金。这是对上个月宣布将其预制件替换为永久住宿的补助金的补充。最新的拨款将允许对学校的一部分进行重大翻新,以便为课堂提供新的住宿——该项目还将涉及屋顶维修、除尘系统的安装、新的科学室配件和牢固的警报器的安装。多尼戈尔副手乔麦克休说功劳必须归功于学校的管理委员会

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

无论如何,是否可以轻松地将这些特殊字符(即 HTML 实体)替换为它们的等效字符,例如撇号等?

EDIT:

编辑:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)


返回:(为清楚起见添加了换行符)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation – she says the problem seems to be getting
worse…….


My external.js file is below i.e. the one which merely displays the text above:


我的 external.js 文件在下面,即只显示上面文本的文件:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);

回答by Josiah Hester

There are many libraries you can include in Titanium (Underscore.string, string.jsthat will make this happen, but if you only want the unescape htmlfunction, just try this code, adapted from the above libraries

有很多库可以包括钛(Underscore.stringstring.js这种情况发生,这将使的,但如果你只希望UNESCAPE HTML功能,只是试试这个代码,改编自上述库

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.

这将用人类可读的派生词替换这些特殊字符并返回修改后的字符串。只需将它放在代码中的某个地方就可以了,我自己在 Titanium 中使用过它,它非常方便。

回答by Dino Liu

I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

我遇到了同样的问题,@Josiah Hester 的解决方案对我有用。我添加了一个条件来检查是否只处理字符串值。

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};

回答by Joshua Briefman

Below are two references to these special characters, unfortunately by filtering them out you may filter out important information that you might actually want to keep. My advice is to use the symbol reference table to create an array and then perform a search in your string for each of the codes and replace the code with it's appropriate response.

For example:

下面是对这些特殊字符的两个引用,不幸的是,通过过滤它们,您可能会过滤掉您可能真正想要保留的重要信息。我的建议是使用符号引用表创建一个数组,然后在您的字符串中搜索每个代码,并用适当的响应替换代码。

例如:

A-Z are represented by: &#65; to &#90;



过滤掉这些信息可能会显着改变您希望读取的数据。





HTML 符号实体参考:


http://www.webmonkey.com/2010/02/special_characters/http://www.webmonkey.com/2010/02/special_characters/


http://www.w3schools.com/tags/ref_symbols.asphttp://www.w3schools.com/tags/ref_symbols.asp