php Simplexml_load_string() 无法解析错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2899274/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Simplexml_load_string() fail to parse error
提问by John Himmelman
I'm trying to load parse a Google Weather API response (Chinese response).
我正在尝试加载解析 Google Weather API 响应(中文响应)。
Hereis the API call.
这是 API 调用。
// This code fails with the following error
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN');
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\weather.php on line 11
( ! ) 警告:simplexml_load_string() [function.simplexml-load-string]:实体:第 1 行:解析器错误:输入不正确的 UTF-8,指示编码!字节:0xB6 0xE0 0xD4 0xC6 在 C:\htdocs\weather.php 第 11 行
Why does loading this response fail?
为什么加载此响应失败?
How do I encode/decode the response so that simplexmlloads it properly?
我如何编码/解码响应以便simplexml正确加载它?
Edit:Here is the code and output.
编辑:这是代码和输出。
<?php
$googleData = file_get_contents('http://www.google.com/ig/api?weather=11102&hl=zh-CN');
$xml = simplexml_load_string($googleData);
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: t_system data="SI"/>
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: ^ in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3
( ! ) 警告:simplexml_load_string() [function.simplexml-load-string]:实体:第 1 行:解析器错误:输入不正确的 UTF-8,指示编码!字节:0xB6 0xE0 0xD4 0xC6 in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317526string (simple string) xml ..\test4.php:3
( ! ) 警告:simplexml_load_string() [function.simplexml-load-string]:t_system data="SI"/>
( ! ) 警告: simplexml_load_string() [function.simplexml-load-string]: ^ in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}() ..\test4. php:0 2 0.1535 317520 simplexml_load_string (string(1364)) ..\test4.php:3
回答by Josh Davis
The problem here is that SimpleXML doesn't look at the HTTP header to determine the character encoding used in the document and simply assumes it's UTF-8 even though Google's server does advertise it as
这里的问题是 SimpleXML 不查看 HTTP 标头来确定文档中使用的字符编码,而只是假设它是 UTF-8,即使 Google 的服务器确实将其宣传为
Content-Type: text/xml; charset=GB2312
You can write a function that will take a look at that header using the super-secret magic variable $http_response_headerand transform the response accordingly. Something like that:
您可以编写一个函数,该函数将使用 super-secret 魔法变量查看该标头$http_response_header并相应地转换响应。类似的东西:
function sxe($url)
{
$xml = file_get_contents($url);
foreach ($http_response_header as $header)
{
if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
{
switch (strtolower($m[1]))
{
case 'utf-8':
// do nothing
break;
case 'iso-8859-1':
$xml = utf8_encode($xml);
break;
default:
$xml = iconv($m[1], 'utf-8', $xml);
}
break;
}
}
return simplexml_load_string($xml);
}
回答by Pekka
Update:I can reproduce the problem. Also, Firefox is auto-sniffing the character set as "chinese simplified" when I output the raw XML feed. Either the Google feed is serving incorrect data (Chinese Simplified characters instead of UTF-8 ones), or it is serving different data when not fetched in a browser - the content-type header in Firefox clearly says utf-8.
更新:我可以重现这个问题。此外,当我输出原始 XML 提要时,Firefox 会自动嗅探字符集为“简体中文”。要么 Google 提要提供的数据不正确(简体中文字符而不是 UTF-8 字符),要么在浏览器中未获取时提供不同的数据 - Firefox 中的内容类型标头明确指出utf-8。
Converting the incoming feed from Chinese Simplified (GB18030, this is what Firefox gave me) into UTF-8 works:
将来自简体中文(GB18030,这是 Firefox 给我的)的传入提要转换为 UTF-8 工作:
$incoming = file_get_contents('http://www.google.com/ig/api?weather=11791&hl=zh-CN');
$xml = iconv("GB18030", "utf-8", $incoming);
$xml = simplexml_load_string($xml);
it doesn't explain nor fix the underlying problem yet, though. I don't have time to take a deep look into this right now, maybe somebody else does. To me, it looks like Google are in fact serving incorrect data (which would surprise me. I didn't know they made mistakes like us mortals. :P)
不过,它尚未解释或解决根本问题。我现在没有时间深入研究这个,也许其他人会这样做。对我来说,看起来谷歌实际上提供了不正确的数据(这会让我感到惊讶。我不知道他们像我们凡人一样犯了错误。:P)
回答by AR.
Just came accross this. This seems to work (the function itself I found on the web, just updated it a bit).:
刚好碰到这个。这似乎有效(我在网上找到的功能本身,只是更新了一点)。:
header('Content-Type: text/html; charset=utf-8');
function getWeather() {
$requestAddress = "http://www.google.com/ig/api?weather=11791&hl=zh-CN";
// Downloads weather data based on location.
$xml_str = file_get_contents($requestAddress,0);
$xml_str = preg_replace("/(<\/?)(\w+):([^>]*>)/", "", $xml_str);
$xml_str = iconv("GB18030", "utf-8", $xml_str);
// Parses XML
$xml = new SimplexmlElement($xml_str, TRUE);
// Loops XML
$count = 0;
echo '<div id="weather">';
foreach($xml->weather as $item) {
foreach($item->forecast_conditions as $new) {
echo "<div class=\"weatherIcon\">\n";
echo "<img src='http://www.google.com/" .$new->icon['data'] . "' alt='".$new->condition['data']."'/><br>\n";
echo "<b>".$new->day_of_week['data']."</b><br>";
echo "Low: ".$new->low['data']." High: ".$new->high['data']."<br>";
echo "\n</div>\n";
}
}
echo '</div>';
}
getWeather();
回答by cmluscco
This is the script I have made in php to parse Google Weather API.
这是我在 php 中制作的用于解析 Google Weather API 的脚本。
<?php
function sxe($url)
{
$xml = file_get_contents($url);
foreach ($http_response_header as $header)
{
if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
{
switch (strtolower($m[1]))
{
case 'utf-8':
// do nothing
break;
case 'iso-8859-1':
$xml = utf8_encode($xml);
break;
default:
$xml = iconv($m[1], 'utf-8', $xml);
}
break;
}
}
return simplexml_load_string($xml);
}
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=46360&h1=en-us');
$information = $xml->xpath("/xml_api_reply/weather/forecast_information");
$current = $xml->xpath("/xml_api_reply/weather/current_conditions");
$forecast = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
print "<br><br><center><div style=\"border: 1px solid; background-color: #dddddd; background-image: url('http://mc-pdfd-live.dyndns.org/images/clouds.bmp'); width: 450\">";
print "<br><h3>";
print $information[0]->city['data'] . " " . $information[0]->unit_system['data'] . " " . $information[0]->postal_code['data'];
print "</h3>";
print "<div style=\"border: 1px solid; width: 320px\">";
print "<table cellpadding=\"5px\"><tr><td><h4>";
print "Now";
print "<br><br>";
print "<img src=http://www.google.com" . $current[0]->icon['data'] . "> ";
print "</h4></td><td><h4>";
print "<br><br>";
print " " . $current[0]->condition['data'] . " ";
print " " . $current[0]->temp_f['data'] . " °F";
print "<br>";
print " " . $current[0]->wind_condition['data'];
print "<br>";
print " " . $current[0]->humidity['data'];
print "<h4></td></tr></table></div>";
print "<table cellpadding=\"5px\"><tr><td>";
print "<table cellpadding=\"5px\"><tr><td><h4>";
print "Today";
print "<br><br>";
print "<img src=http://www.google.com" . $forecast[0]->icon['data'] . "> ";
print "</h4></td><td><h4>";
print "<br><br>";
print $forecast[0]->condition['data'];
print "<br>";
print "High " . $forecast[0]->high['data'] . " °F";
print "<br>";
print "Low " . $forecast[0]->low['data'] . " °F";
print "</h4></td></tr></table>";
print "<table cellpadding=\"5px\"><tr><td><h4>";
print $forecast[2]->day_of_week['data'];
print "<br><br>";
print "<img src=http://www.google.com" . $forecast[2]->icon['data'] . "> ";
print "</h4></td><td><h4>";
print "<br><br>";
print " " . $forecast[2]->condition['data'];
print "<br>";
print " High " . $forecast[2]->high['data'] . " °F";
print "<br>";
print " Low " . $forecast[2]->low['data'] . " °F";
print "</h4></td></tr></table>";
print "</td><td>";
print "<table cellpadding=\"5px\"><tr><td><h4>";
print $forecast[1]->day_of_week['data'];
print "<br><br>";
print "<img src=http://www.google.com" . $forecast[1]->icon['data'] . "> ";
print "</h4></td><td><h4>";
print "<br><br>";
print " " . $forecast[1]->condition['data'];
print "<br>";
print " High " . $forecast[1]->high['data'] . " °F";
print "<br>";
print " Low " . $forecast[1]->low['data'] . " °F";
print "</h4></td></tr></table>";
print "<table cellpadding=\"5px\"><tr><td><h4>";
print $forecast[3]->day_of_week['data'];
print "<br><br>";
print "<img src=http://www.google.com" . $forecast[3]->icon['data'] . "> ";
print "</h4></td><td><h4>";
print "<br><br>";
print " " . $forecast[3]->condition['data'];
print "<br>";
print " High " . $forecast[3]->high['data'] . " °F";
print "<br>";
print " Low " . $forecast[3]->low['data'] . " °F";
print "</h4></td></tr></table>";
print "</td></tr></table>";
print "</div></center>";
?>
回答by Igor Vakulenko
Try to add in the url query parameter eo = utf-8. In this case, the answer will be exclusively the UTF-8 encoding. It helped me.
尝试在url查询参数中加入eo = utf-8。在这种情况下,答案将仅是 UTF-8 编码。它帮助了我。
http://www.google.com/ig/api?weather=?????°ree=??????&oe=utf-8&hl=es

