php 问题 - XML 声明只允许在文档的开头

Question

提问by Aamir

xml:19558: parser error : XML declaration allowed only at the start of the document

xml:19558: 解析器错误: XML 声明只允许在文档的开头

any solutions? i am using php XMLReader to parse a large XML file, but getting this error. i know the file is not well formatted but i think its not possible to go through the file and remove these extra declarations. so any idea, PLEASE HELP

任何解决方案？我正在使用 php XMLReader 解析一个大型 XML 文件，但出现此错误。我知道该文件格式不正确，但我认为不可能遍历该文件并删除这些额外的声明。所以任何想法，请帮助

Answer 1

回答by Ben

Make sure there isn't any white space before the first tag. Try this:

确保第一个标签之前没有任何空格。尝试这个：

    <?php
//Declarations
$file = "data.txt"; //The file to read from.

#Read the file
$fp = fopen($file, "r"); //Open the file
$data = ""; //Initialize variable to contain the file's content
while(!feof($fp)) //Loop through the file, read it till the end.
{
    $data .= fgets($fp, 1024); //append next kb to data
} 
fclose($fp); //Close file
#End read file
$split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string

foreach ($split as $sxml) //Loop through each xml string
{
    //echo $sxml;
    $reader = new XMLReader(); //Initialize the reader
    $reader->xml($sxml) or die("File not found"); //open the current xml string
    while($reader->read()) //Read it
    {
        switch($reader->nodeType)
        {
            case constant('XMLREADER::ELEMENT'): //Read element
                if ($reader->name == 'record')
                {
                    $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                    echo $dataa; //Print it to screen.
                }
            break;
        }
    }
    $reader->close(); //close reader
}
?>

Set the $file variable to the file you want. Note I don't know how well this will work for a 4gb file. Tell me if it doesn't.

将 $file 变量设置为您想要的文件。注意我不知道这对于 4gb 文件的效果如何。告诉我如果没有。

EDIT: Here is another solution, it should work better with the larger file (parses as it is reading the file).

编辑：这是另一种解决方案，它应该适用于更大的文件（在读取文件时进行解析）。

<?php
set_time_limit(0);
//Declarations
$file = "data.txt"; //The file to read from.

#Read the file
$fp = fopen($file, "r") or die("Couldn't Open"); //Open the file

$FoundXmlTagStep = 0;
$FoundEndXMLTagStep = 0;
$curXML = "";
$firstXMLTagRead = false;
while(!feof($fp)) //Loop through the file, read it till the end.
{
    $data = fgets($fp, 2);
    if ($FoundXmlTagStep==0 && $data == "<")
        $FoundXmlTagStep=1;
    else if ($FoundXmlTagStep==1 && $data == "x")
        $FoundXmlTagStep=2;
    else if ($FoundXmlTagStep==2 && $data == "m")
        $FoundXmlTagStep=3;
    else if ($FoundXmlTagStep==3 && $data == "l")
    {
        $FoundXmlTagStep=4;
        $firstXMLTagRead = true;
    }
    else if ($FoundXmlTagStep!=4)
        $FoundXmlTagStep=0;

    if ($FoundXmlTagStep==4)
    {
        if ($firstXMLTagRead)
        {
            $firstXMLTagRead = false;
            $curXML = "<xm";
        }
        $curXML .= $data;

        //Start trying to match end of xml
        if ($FoundEndXMLTagStep==0 && $data == "<")
            $FoundEndXMLTagStep=1;
        elseif ($FoundEndXMLTagStep==1 && $data == "/")
            $FoundEndXMLTagStep=2;
        elseif ($FoundEndXMLTagStep==2 && $data == "x")
            $FoundEndXMLTagStep=3;
        elseif ($FoundEndXMLTagStep==3 && $data == "m")
            $FoundEndXMLTagStep=4;
        elseif ($FoundEndXMLTagStep==4 && $data == "l")
            $FoundEndXMLTagStep=5;
        elseif ($FoundEndXMLTagStep==5 && $data == ">")
        {
            $FoundEndXMLTagStep=0;
            $FoundXmlTagStep=0;
            #finished Reading XML
            ParseXML ($curXML);
        }
        elseif ($FoundEndXMLTagStep!=5)
            $FoundEndXMLTagStep=0;
    }
} 
fclose($fp); //Close file
function ParseXML ($xml)
{
    //echo $sxml;
    $reader = new XMLReader(); //Initialize the reader
    $reader->xml($xml) or die("File not found"); //open the current xml string
    while($reader->read()) //Read it
    {
        switch($reader->nodeType)
        {
            case constant('XMLREADER::ELEMENT'): //Read element
                if ($reader->name == 'record')
                {
                    $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                    echo $dataa; //Print it to screen.
                }
            break;
        }
    }
    $reader->close(); //close reader
}
?>

Answer 2

回答by Ned Batchelder

If you have multiple XML declarations, you likely have a concatenation of many XML files, and also more than one root element. It's not clear how you would meaningfully parse them.

如果您有多个 XML 声明，您可能有多个 XML 文件的串联，以及多个根元素。目前尚不清楚您将如何有意义地解析它们。

Try really hard to get the source of the XML to give you real XML first. If that doesn't work, see if you can do some preprocessing to fix the XML before you parse it.

非常努力地获取 XML 的来源，以便首先为您提供真正的 XML。如果这不起作用，请查看是否可以在解析 XML 之前进行一些预处理以修复它。

Answer 3

回答by kaven

Another possible cause to this problem is unicode file head. If your XML's encoding is UTF-8, the file content will always start with these 3 bytes "EF BB BF". These bytes may be interpreted incorrectly if one attempts to convert from byte array to string. The solution is to write byte array to file directly without reading getString from the byte array.

此问题的另一个可能原因是 unicode 文件头。如果您的 XML 的编码是 UTF-8，则文件内容将始终以这 3 个字节“EF BB BF”开头。如果试图从字节数组转换为字符串，这些字节可能会被错误地解释。解决方案是直接将字节数组写入文件，而不从字节数组中读取 getString。

ASCII has no file head Unicode: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00

ASCII 没有文件头 Unicode: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00

Just open the file in ultraedit and you can see these bytes.

只需在ultraedit中打开文件，您就可以看到这些字节。

php 问题 - XML 声明只允许在文档的开头

提问by Aamir

回答by Ben

回答by Ned Batchelder

回答by kaven

相关推荐

最近更新

标签

php 问题 - XML 声明只允许在文档的开头

提问by Aamir

回答by Ben

回答by Ned Batchelder

回答by kaven

相关推荐

php Laravel Artisan CLI 安全地停止守护进程队列工作者

在 PHP 中减去时间

php 在 Laravel 中编辑和更新数据

php 如何生成一个 6 位的唯一编号？

相关推荐

最近更新

标签