C语言 如何使用 libxml2 解析 XML 中的数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5465965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 08:12:04  来源:igfitidea点击:

How can libxml2 be used to parse data from XML?

cxmlparsinglibxml2

提问by system

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.

我环顾了 libxml2 代码示例,但对如何将它们拼凑在一起感到困惑。

What are the steps needed when using libxml2 to just parse or extract data from an XML file?

使用 libxml2 仅从 XML 文件解析或提取数据时需要哪些步骤?

I would like to get hold of, and possibly store information for, certain attributes. How is this done?

我想掌握并可能存储某些属性的信息。这是怎么做的?

采纳答案by Sadique

I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.

我相信你首先需要创建一个解析树。也许这篇文章可以提供帮助,请查看如何使用 Libxml2 解析树的部分

回答by Jason Viers

libxml2 provides various examples showing basic usage.

libxml2 提供了显示基本用法的各种示例。

http://xmlsoft.org/examples/index.html

http://xmlsoft.org/examples/index.html

For your stated goals, tree1.c would probably be most relevant.

对于您既定的目标,tree1.c 可能最相关。

tree1.c: Navigates a tree to print element names

Parse a file to a tree, use xmlDocGetRootElement() to get the root element, then walk the document and print all the element name in document order.

tree1.c:导航树以打印元素名称

将文件解析为树,使用 xmlDocGetRootElement() 获取根元素,然后遍历文档并按文档顺序打印所有元素名称。

http://xmlsoft.org/examples/tree1.c

http://xmlsoft.org/examples/tree1.c

Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).

一旦您拥有元素的 xmlNode 结构,“properties”成员就是属性的链接列表。每个 xmlAttr 对象都有一个“name”和“children”对象(分别是该属性的名称/值),以及一个指向下一个属性的“next”成员(或最后一个属性为空)。

http://xmlsoft.org/html/libxml-tree.html#xmlNode

http://xmlsoft.org/html/libxml-tree.html#xmlNode

http://xmlsoft.org/html/libxml-tree.html#xmlAttr

http://xmlsoft.org/html/libxml-tree.html#xmlAttr

回答by Cooper6581

I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.

当我学习使用 libxml2 构建 rss 提要解析器时,我发现这两个资源很有帮助。

Tutorial with SAX interface

SAX 接口教程

Tutorial using the DOM Tree(code example for getting an attribute value included)

使用 DOM 树的教程(包括获取属性值的代码示例)

回答by Pankaj Vavadiya

Here, I mentioned complete process to extract XML/HTML data from file on windows platform.

在这里,我提到了在 Windows 平台上从文件中提取 XML/HTML 数据的完整过程。

  1. First download pre-compiled .dllform http://xmlsoft.org/sources/win32/
  2. Also download its dependency iconv.dlland zlib1.dllfrom the same page

  3. Extract all .zip files into the same directory. For Ex: D:\demo\

  4. Copy iconv.dll, zlib1.dlland libxml2.dllinto c:\windows\system32deirectory

  5. Make libxml_test.cppfile and copy following code into that file.

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <libxml/HTMLparser.h>
    
    void traverse_dom_trees(xmlNode * a_node)
    {
        xmlNode *cur_node = NULL;
    
        if(NULL == a_node)
        {
            //printf("Invalid argument a_node %p\n", a_node);
            return;
        }
    
        for (cur_node = a_node; cur_node; cur_node = cur_node->next) 
        {
            if (cur_node->type == XML_ELEMENT_NODE) 
            {
                /* Check for if current node should be exclude or not */
                printf("Node type: Text, name: %s\n", cur_node->name);
            }
            else if(cur_node->type == XML_TEXT_NODE)
            {
                /* Process here text node, It is available in cpStr :TODO: */
                printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
            }
            traverse_dom_trees(cur_node->children);
        }
    }
    
    int main(int argc, char **argv) 
    {
        htmlDocPtr doc;
        xmlNode *roo_element = NULL;
    
        if (argc != 2)  
        {
            printf("\nInvalid argument\n");
            return(1);
        }
    
        /* Macro to check API for match with the DLL we are using */
        LIBXML_TEST_VERSION    
    
        doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
        if (doc == NULL) 
        {
            fprintf(stderr, "Document not parsed successfully.\n");
            return 0;
        }
    
        roo_element = xmlDocGetRootElement(doc);
    
        if (roo_element == NULL) 
        {
            fprintf(stderr, "empty document\n");
            xmlFreeDoc(doc);
            return 0;
        }
    
        printf("Root Node is %s\n", roo_element->name);
        traverse_dom_trees(roo_element);
    
        xmlFreeDoc(doc);       // free document
        xmlCleanupParser();    // Free globals
        return 0;
    }
    
  6. Open Visual Studio Command Promt

  7. Go To D:\demo directory

  8. execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.libcommand

  9. Run binary using libxml_test.exe test.htmlcommand(Here test.html may be any valid HTML file)

  1. 首先下载预编译的.dll形式http://xmlsoft.org/sources/win32/
  2. 同时从同一页面下载它的依赖iconv.dllzlib1.dll

  3. 将所有 .zip 文件解压缩到同一目录中。例如:D:\demo\

  4. iconv.dllzlib1.dlllibxml2.dll复制到c:\windows\system32目录

  5. 制作libxml_test.cpp文件并将以下代码复制到该文件中。

    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    #include <libxml/HTMLparser.h>
    
    void traverse_dom_trees(xmlNode * a_node)
    {
        xmlNode *cur_node = NULL;
    
        if(NULL == a_node)
        {
            //printf("Invalid argument a_node %p\n", a_node);
            return;
        }
    
        for (cur_node = a_node; cur_node; cur_node = cur_node->next) 
        {
            if (cur_node->type == XML_ELEMENT_NODE) 
            {
                /* Check for if current node should be exclude or not */
                printf("Node type: Text, name: %s\n", cur_node->name);
            }
            else if(cur_node->type == XML_TEXT_NODE)
            {
                /* Process here text node, It is available in cpStr :TODO: */
                printf("node type: Text, node content: %s,  content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
            }
            traverse_dom_trees(cur_node->children);
        }
    }
    
    int main(int argc, char **argv) 
    {
        htmlDocPtr doc;
        xmlNode *roo_element = NULL;
    
        if (argc != 2)  
        {
            printf("\nInvalid argument\n");
            return(1);
        }
    
        /* Macro to check API for match with the DLL we are using */
        LIBXML_TEST_VERSION    
    
        doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
        if (doc == NULL) 
        {
            fprintf(stderr, "Document not parsed successfully.\n");
            return 0;
        }
    
        roo_element = xmlDocGetRootElement(doc);
    
        if (roo_element == NULL) 
        {
            fprintf(stderr, "empty document\n");
            xmlFreeDoc(doc);
            return 0;
        }
    
        printf("Root Node is %s\n", roo_element->name);
        traverse_dom_trees(roo_element);
    
        xmlFreeDoc(doc);       // free document
        xmlCleanupParser();    // Free globals
        return 0;
    }
    
  6. 打开 Visual Studio 命令提示符

  7. 转到 D:\demo 目录

  8. 执行cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib命令

  9. 使用libxml_test.exe test.html命令运行二进制文件(这里 test.html 可以是任何有效的 HTML 文件)