如何在 Java 中对 XML 进行转义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2833956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 13:24:30  来源:igfitidea点击:

how to unescape XML in java

javaxmlescaping

提问by Bas Hendriks

I need to unescape a xml string containing escaped XML tags:

我需要取消转义包含转义 XML 标签的 xml 字符串:

<
>
&
etc...

I did find some libs that can perform this task, but i'd rather use a single method that can perform this task.

我确实找到了一些可以执行此任务的库,但我宁愿使用可以执行此任务的单一方法。

Can someone help?

有人可以帮忙吗?

cheers, Bas Hendriks

干杯,巴斯亨德里克斯

采纳答案by Bozho

StringEscapeUtils.unescapeXml(xml)

(commons-lang, download)

( commons-lang,下载)

回答by texclayton

Here's a simple method to unescape XML. It handles the predefined XML entities and decimal numerical entities (&#nnnn;). Modifying it to handle hex entities (&#xhhhh;) should be simple.

下面是一种对 XML 进行转义的简单方法。它处理预定义的 XML 实体和十进制数字实体 (&#nnnn;)。修改它以处理十六进制实体 (&#xhhhh;) 应该很简单。

public static String unescapeXML( final String xml )
{
    Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" );
    //Unfortunately, Matcher requires a StringBuffer instead of a StringBuilder
    StringBuffer unescapedOutput = new StringBuffer( xml.length() );

    Matcher m = xmlEntityRegex.matcher( xml );
    Map<String,String> builtinEntities = null;
    String entity;
    String hashmark;
    String ent;
    int code;
    while ( m.find() ) {
        ent = m.group(2);
        hashmark = m.group(1);
        if ( (hashmark != null) && (hashmark.length() > 0) ) {
            code = Integer.parseInt( ent );
            entity = Character.toString( (char) code );
        } else {
            //must be a non-numerical entity
            if ( builtinEntities == null ) {
                builtinEntities = buildBuiltinXMLEntityMap();
            }
            entity = builtinEntities.get( ent );
            if ( entity == null ) {
                //not a known entity - ignore it
                entity = "&" + ent + ';';
            }
        }
        m.appendReplacement( unescapedOutput, entity );
    }
    m.appendTail( unescapedOutput );

    return unescapedOutput.toString();
}

private static Map<String,String> buildBuiltinXMLEntityMap()
{
    Map<String,String> entities = new HashMap<String,String>(10);
    entities.put( "lt", "<" );
    entities.put( "gt", ">" );
    entities.put( "amp", "&" );
    entities.put( "apos", "'" );
    entities.put( "quot", "\"" );
    return entities;
}

回答by msangel

If you work with JSP, use su:unescapeXml from openutils-elfunctions

如果您使用 JSP,请使用openutils-elfunctions 中的su:unescapeXml

回答by Balazs Zsoldos

Here is one that I wrote in ten minutes. It does not use regular expressions, only simple iterations. I do not think that this can be enhanced to be much faster.

这是我用十分钟写的一篇。它不使用正则表达式,只使用简单的迭代。我不认为这可以提高得更快。

public static String unescape(final String text) {
    StringBuilder result = new StringBuilder(text.length());
    int i = 0;
    int n = text.length();
    while (i < n) {
        char charAt = text.charAt(i);
        if (charAt != '&') {
            result.append(charAt);
            i++;
        } else {
            if (text.startsWith("&amp;", i)) {
                result.append('&');
                i += 5;
            } else if (text.startsWith("&apos;", i)) {
                result.append('\'');
                i += 6;
            } else if (text.startsWith("&quot;", i)) {
                result.append('"');
                i += 6;
            } else if (text.startsWith("&lt;", i)) {
                result.append('<');
                i += 4;
            } else if (text.startsWith("&gt;", i)) {
                result.append('>');
                i += 4;
            } else i++;
        }
    }
    return result.toString();
}