C语言如何使用 libxml2 解析 XML 中的数据？

Question

提问by system

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.

我环顾了 libxml2 代码示例，但对如何将它们拼凑在一起感到困惑。

What are the steps needed when using libxml2 to just parse or extract data from an XML file?

使用 libxml2 仅从 XML 文件解析或提取数据时需要哪些步骤？

I would like to get hold of, and possibly store information for, certain attributes. How is this done?

我想掌握并可能存储某些属性的信息。这是怎么做的？

Answer 1

采纳答案by Sadique

I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.

我相信你首先需要创建一个解析树。也许这篇文章可以提供帮助，请查看如何使用 Libxml2 解析树的部分。

Answer 2

回答by Jason Viers

libxml2 provides various examples showing basic usage.

libxml2 提供了显示基本用法的各种示例。

http://xmlsoft.org/examples/index.html

For your stated goals, tree1.c would probably be most relevant.

对于您既定的目标，tree1.c 可能最相关。

tree1.c: Navigates a tree to print element names
Parse a file to a tree, use xmlDocGetRootElement() to get the root element, then walk the document and print all the element name in document order.

tree1.c：导航树以打印元素名称
将文件解析为树，使用 xmlDocGetRootElement() 获取根元素，然后遍历文档并按文档顺序打印所有元素名称。

http://xmlsoft.org/examples/tree1.c

Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).

一旦您拥有元素的 xmlNode 结构，“properties”成员就是属性的链接列表。每个 xmlAttr 对象都有一个“name”和“children”对象（分别是该属性的名称/值），以及一个指向下一个属性的“next”成员（或最后一个属性为空）。

http://xmlsoft.org/html/libxml-tree.html#xmlNode

http://xmlsoft.org/html/libxml-tree.html#xmlAttr

Answer 3

回答by Cooper6581

I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.

当我学习使用 libxml2 构建 rss 提要解析器时，我发现这两个资源很有帮助。

Tutorial with SAX interface

SAX 接口教程

Tutorial using the DOM Tree(code example for getting an attribute value included)

使用 DOM 树的教程（包括获取属性值的代码示例）

Answer 4

回答by Pankaj Vavadiya

Here, I mentioned complete process to extract XML/HTML data from file on windows platform.

在这里，我提到了在 Windows 平台上从文件中提取 XML/HTML 数据的完整过程。

First download pre-compiled .dllform http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dlland zlib1.dllfrom the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dlland libxml2.dllinto c:\windows\system32deirectory

Make libxml_test.cppfile and copy following code into that file.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>

void traverse_dom_trees(xmlNode * a_node)
{
    xmlNode *cur_node = NULL;

    if(NULL == a_node)
    {
        //printf("Invalid argument a_node %p\n", a_node);
        return;
    }

    for (cur_node = a_node; cur_node; cur_node = cur_node->next) 
    {
        if (cur_node->type == XML_ELEMENT_NODE) 
        {
            /* Check for if current node should be exclude or not */
            printf("Node type: Text, name: %s\n", cur_node->name);
        }
        else if(cur_node->type == XML_TEXT_NODE)
        {
            /* Process here text node, It is available in cpStr :TODO: */
            printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
        }
        traverse_dom_trees(cur_node->children);
    }
}

int main(int argc, char **argv) 
{
    htmlDocPtr doc;
    xmlNode *roo_element = NULL;

    if (argc != 2)  
    {
        printf("\nInvalid argument\n");
        return(1);
    }

    /* Macro to check API for match with the DLL we are using */
    LIBXML_TEST_VERSION    

    doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
    if (doc == NULL) 
    {
        fprintf(stderr, "Document not parsed successfully.\n");
        return 0;
    }

    roo_element = xmlDocGetRootElement(doc);

    if (roo_element == NULL) 
    {
        fprintf(stderr, "empty document\n");
        xmlFreeDoc(doc);
        return 0;
    }

    printf("Root Node is %s\n", roo_element->name);
    traverse_dom_trees(roo_element);

    xmlFreeDoc(doc);       // free document
    xmlCleanupParser();    // Free globals
    return 0;
}

Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.libcommand
Run binary using libxml_test.exe test.htmlcommand(Here test.html may be any valid HTML file)

首先下载预编译的.dll形式http://xmlsoft.org/sources/win32/
同时从同一页面下载它的依赖iconv.dll和zlib1.dll
将所有 .zip 文件解压缩到同一目录中。例如：D:\demo\
将iconv.dll、zlib1.dll和libxml2.dll复制到c:\windows\system32目录

制作libxml_test.cpp文件并将以下代码复制到该文件中。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>

void traverse_dom_trees(xmlNode * a_node)
{
    xmlNode *cur_node = NULL;

    if(NULL == a_node)
    {
        //printf("Invalid argument a_node %p\n", a_node);
        return;
    }

    for (cur_node = a_node; cur_node; cur_node = cur_node->next) 
    {
        if (cur_node->type == XML_ELEMENT_NODE) 
        {
            /* Check for if current node should be exclude or not */
            printf("Node type: Text, name: %s\n", cur_node->name);
        }
        else if(cur_node->type == XML_TEXT_NODE)
        {
            /* Process here text node, It is available in cpStr :TODO: */
            printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
        }
        traverse_dom_trees(cur_node->children);
    }
}

int main(int argc, char **argv) 
{
    htmlDocPtr doc;
    xmlNode *roo_element = NULL;

    if (argc != 2)  
    {
        printf("\nInvalid argument\n");
        return(1);
    }

    /* Macro to check API for match with the DLL we are using */
    LIBXML_TEST_VERSION    

    doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
    if (doc == NULL) 
    {
        fprintf(stderr, "Document not parsed successfully.\n");
        return 0;
    }

    roo_element = xmlDocGetRootElement(doc);

    if (roo_element == NULL) 
    {
        fprintf(stderr, "empty document\n");
        xmlFreeDoc(doc);
        return 0;
    }

    printf("Root Node is %s\n", roo_element->name);
    traverse_dom_trees(roo_element);

    xmlFreeDoc(doc);       // free document
    xmlCleanupParser();    // Free globals
    return 0;
}

打开 Visual Studio 命令提示符
转到 D:\demo 目录
执行cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib命令
使用libxml_test.exe test.html命令运行二进制文件（这里 test.html 可以是任何有效的 HTML 文件）

C语言如何使用 libxml2 解析 XML 中的数据？

提问by system

采纳答案by Sadique

回答by Jason Viers

回答by Cooper6581

回答by Pankaj Vavadiya

相关推荐

最近更新

标签

C语言 如何使用 libxml2 解析 XML 中的数据？

提问by system

采纳答案by Sadique

回答by Jason Viers

回答by Cooper6581

回答by Pankaj Vavadiya

相关推荐

C语言 “strcpy”与“malloc”？

C语言 C - gettimeofday 计算时间？

C语言 gcc 预处理器输出中以井号和数字（如“#1“ac””）开头的行是什么意思？

C语言 从函数返回数组/指针

相关推荐

最近更新

标签

C语言如何使用 libxml2 解析 XML 中的数据？

C语言从函数返回数组/指针