用 PHP 生成 XML 文档(转义字符)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3957360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 11:33:47  来源:igfitidea点击:

Generating XML document in PHP (escape characters)

phpxml

提问by Tomas Jancik

I'm generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the correct way to do it?

我正在从 PHP 脚本生成 XML 文档,我需要对 XML 特殊字符进行转义。我知道应该转义的字符列表;但正确的做法是什么?

Should the characters be escaped just with backslash (\') or what is the proper way? Is there any built-in PHP function that can handle this for me?

字符应该只用反斜杠 (\') 转义还是正确的方法是什么?是否有任何内置的 PHP 函数可以为我处理这个问题?

采纳答案by Ionu? G. Stan

Use the DOMclasses to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.

使用DOM类生成整个 XML 文档。它将处理我们甚至不想关心的编码和解码。



Edit:This was criticized by @Tchalvak:

编辑:@Tchalvak 批评了这一点:

The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.

DOM 对象创建了一个完整的 XML 文档,它本身并不容易仅用于对字符串进行编码。

Which is wrong, DOMDocument can properly output just a fragment not the whole document:

这是错误的,DOMDocument 可以正确输出只是一个片段而不是整个文档:

$doc->saveXML($fragment);

which gives:

这使:

Test &amp; <b> and encode </b> :)
Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

as in:

如:

$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();

// adding XML verbatim:
$xml = "Test &amp; <b> and encode </b> :)\n";
$fragment->appendXML($xml);

// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));

// output the result
echo $doc->saveXML($fragment);

See Demo

演示

回答by Tomas Jancik

I created simple function that escapes with the five "predefined entities"that are in XML:

我创建了一个简单的函数,它使用XML 中的五个“预定义实体”进行转义:

function xml_entities($string) {
    return strtr(
        $string, 
        array(
            "<" => "&lt;",
            ">" => "&gt;",
            '"' => "&quot;",
            "'" => "&apos;",
            "&" => "&amp;",
        )
    );
}

Usage example Demo:

使用示例演示

$text = "Test &amp; <b> and encode </b> :)";
echo xml_entities($text);

Output:

输出:

Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

A similar effect can be achieved by using str_replacebut it is fragile because of double-replacings (untested, not recommended):

使用可以实现类似的效果,str_replace但由于双重替换而脆弱(未经测试,不推荐):

function xml_entities($string) {
    return str_replace(
        array("&",     "<",    ">",    '"',      "'"),
        array("&amp;", "&lt;", "&gt;", "&quot;", "&apos;"), 
        $string
    );
}

回答by MarcDefiant

What about the htmlspecialchars()function?

怎么样的htmlspecialchars()功能?

htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);

Note:the ENT_XML1flag is only available if you have PHP 5.4.0 or higher.

注意:ENT_XML1标志仅在您拥有 PHP 5.4.0 或更高版本时可用。

htmlspecialchars()with these parameters replaces the following characters:

htmlspecialchars()使用这些参数替换以下字符:

  • &(ampersand) becomes &amp;
  • "(double quote) becomes &quot;
  • '(single quote) becomes &apos;
  • <(less than) becomes &lt;
  • >(greater than) becomes &gt;
  • &(&) 变成 &amp;
  • "(双引号)变成 &quot;
  • '(单引号)变成 &apos;
  • <(小于)变成 &lt;
  • >(大于)变成 &gt;

You can get the translation table by using the get_html_translation_table()function.

您可以使用该get_html_translation_table()函数获取翻译表。

回答by Josh Sunderman

Tried hard to deal with XML entity issue, solve in this way:

努力处理XML实体问题,解决方法如下:

htmlspecialchars($value, ENT_QUOTES, 'UTF-8')

回答by Capilé

In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the "encoding" in the <?xmlline). The accented characters don't need to be escaped as long as they are encoded as the document.

为了获得有效的最终 XML 文本,您需要转义所有 XML 实体,并以与 XML 文档处理指令声明的相同编码编写文本(行中的“编码” <?xml)。重音字符不需要转义,只要它们被编码为文档即可。

However, in many situations simply escaping the input with htmlspecialcharsmay lead to double-encoded entities (for example &eacute;would become &amp;eacute;), so I suggest decoding html entities first:

然而,在许多情况下,简单地转义输入htmlspecialchars可能会导致双重编码的实体(例如&eacute;会变成&amp;eacute;),所以我建议首先解码 html 实体:

function xml_escape($s)
{
    $s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
    $s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
    return $s;
}

Now you need to make sure all accented characters are valid in the XML document encoding. I strongly encourage to always encode XML output in UTF-8, since not all the XML parsers respect the XML document processing-instruction encoding. If your input might come from a different charset, try using utf8_encode().

现在您需要确保所有重音字符在 XML 文档编码中都是有效的。我强烈建议始终使用 UTF-8 对 XML 输出进行编码,因为并非所有 XML 解析器都遵守 XML 文档处理指令编码。如果您的输入可能来自不同的字符集,请尝试使用utf8_encode().

There's a special case, which is your input may come from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R -- PHP treats them all the same, but there are some slight differences in them -- some of which even iconv()cannot handle. I could only solve this encoding issue by complementing utf8_encode()behavior:

有一种特殊情况,您的输入可能来自以下编码之一:ISO-8859-1、ISO-8859-15、UTF-8、cp866、cp1251、cp1252 和 KOI8-R——PHP 将它们全部处理相同,但它们之间存在一些细微差别——其中一些甚至iconv()无法处理。我只能通过补充utf8_encode()行为来解决这个编码问题:

function encode_utf8($s)
{
    $cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac",
    "\xc2\x82" => "\xe2\x80\x9a",
    "\xc2\x83" => "\xc6\x92",
    "\xc2\x84" => "\xe2\x80\x9e",
    "\xc2\x85" => "\xe2\x80\xa6",
    "\xc2\x86" => "\xe2\x80\xa0",
    "\xc2\x87" => "\xe2\x80\xa1",
    "\xc2\x88" => "\xcb\x86",
    "\xc2\x89" => "\xe2\x80\xb0",
    "\xc2\x8a" => "\xc5\xa0",
    "\xc2\x8b" => "\xe2\x80\xb9",
    "\xc2\x8c" => "\xc5\x92",
    "\xc2\x8e" => "\xc5\xbd",
    "\xc2\x91" => "\xe2\x80\x98",
    "\xc2\x92" => "\xe2\x80\x99",
    "\xc2\x93" => "\xe2\x80\x9c",
    "\xc2\x94" => "\xe2\x80\x9d",
    "\xc2\x95" => "\xe2\x80\xa2",
    "\xc2\x96" => "\xe2\x80\x93",
    "\xc2\x97" => "\xe2\x80\x94",
    "\xc2\x98" => "\xcb\x9c",
    "\xc2\x99" => "\xe2\x84\xa2",
    "\xc2\x9a" => "\xc5\xa1",
    "\xc2\x9b" => "\xe2\x80\xba",
    "\xc2\x9c" => "\xc5\x93",
    "\xc2\x9e" => "\xc5\xbe",
    "\xc2\x9f" => "\xc5\xb8"
    );
    $s=strtr(utf8_encode($s), $cp1252_map);
    return $s;
}

回答by nubeiro

If you need proper xml output, simplexml is the way to go:

如果您需要适当的 xml 输出,simplexml 是要走的路:

http://www.php.net/manual/en/simplexmlelement.asxml.php

http://www.php.net/manual/en/simplexmlelement.asxml.php

回答by Adam Gent

Proper escaping is the way to get correct XML output but you need to handle escaping differentlyfor attributesand elements. (That is Tomas' answer is incorrect).

正确的转义是获得正确 XML 输出的方法,但您需要以不同的方式处理属性元素的转义。(这是托马斯的回答是不正确的)。

I wrote/stole some Java codea while back that differentiates between attribute and element escaping. The reason is that the XML parser considers all white space special particularly in attributes.

不久前我写/偷了一些Java 代码来区分属性和元素转义。原因是 XML 解析器认为所有空格都是特殊的,尤其是在属性中。

It should be trivial to port that over to PHP (you can use Tomas Jancik's approach with the above appropriate escaping). You don't have to worry about escaping extended entities if your using UTF-8.

将其移植到 PHP 应该是微不足道的(您可以使用 Tomas Jancik 的方法和上述适当的转义)。如果您使用UTF-8.

If you don't want to port my Java code you can look at XMLWriterwhich is stream based and uses libxml so it should be very efficient.

如果您不想移植我的 Java 代码,您可以查看基于流并使用 libxml 的XMLWriter,因此它应该非常有效。

回答by Alois Cochard

You can use this methods: http://php.net/manual/en/function.htmlentities.php

您可以使用此方法:http: //php.net/manual/en/function.htmlentities.php

In that way all entities (html/xml) are escaped and you can put your string inside XML tags

这样,所有实体 (html/xml) 都会被转义,您可以将字符串放在 XML 标签中

回答by paderEpiktet

Based on the solution of sadeghj the following code worked for me:

基于sadeghj的解决方案,以下代码对我有用:

/**
 * @param $arr1 the single string that shall be masked
 * @return the resulting string with the masked characters
 */
function replace_char($arr1)
{
    if (strpos ($arr1,'&')!== FALSE) { //test if the character appears 
        $arr1=preg_replace('/&/','&amp;', $arr1); // do this first
    }

    // just encode the
    if (strpos ($arr1,'>')!== FALSE) {
        $arr1=preg_replace('/>/','&gt;', $arr1);
    }
    if (strpos ($arr1,'<')!== FALSE) {
        $arr1=preg_replace('/</','&lt;', $arr1);
    }

    if (strpos ($arr1,'"')!== FALSE) {
        $arr1=preg_replace('/"/','&quot;', $arr1);
    }

    if (strpos ($arr1,'\'')!== FALSE) {
        $arr1=preg_replace('/\'/','&apos;', $arr1);
    }

    return $arr1;
}

回答by sadeghj

 function replace_char($arr1)
 {
  $arr[]=preg_replace('>','&gt', $arr1); 
  $arr[]=preg_replace('<','&lt', $arr1);
  $arr[]=preg_replace('"','&quot', $arr1);
  $arr[]=preg_replace('\'','&apos', $arr1);
  $arr[]=preg_replace('&','&amp', $arr1);

  return $arr;
  }