ruby 如何使用 Nokogiri 解析 XML 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17600037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 06:04:24  来源:igfitidea点击:

How do I use Nokogiri to parse an XML file?

rubyxmlparsingnokogiri

提问by camdixon

I'm having some issues with Nokogiri.

我对 Nokogiri 有一些问题。

I am trying to parse this XML file:

我正在尝试解析这个 XML 文件:

<Collection version="2.0" id="74j5hc4je3b9">
  <Name>A Funfair in Bangkok</Name>
  <PermaLink>Funfair in Bangkok</PermaLink>
  <PermaLinkIsName>True</PermaLinkIsName>
  <Description>A small funfair near On Nut in Bangkok.</Description>
  <Date>2009-08-03T00:00:00</Date>
  <IsHidden>False</IsHidden>
  <Items>
    <Item filename="AGC_1998.jpg">
      <Title>Funfair in Bangkok</Title>
      <Caption>A small funfair near On Nut in Bangkok.</Caption>
      <Authors>Anthony Bouch</Authors>
      <Copyright>Copyright ? Anthony Bouch</Copyright>
      <CreatedDate>2009-08-07T19:22:08</CreatedDate>
      <Keywords>
        <Keyword>Funfair</Keyword>
        <Keyword>Bangkok</Keyword>
        <Keyword>Thailand</Keyword>
      </Keywords>
      <ThumbnailSize width="133" height="200" />
      <PreviewSize width="532" height="800" />
      <OriginalSize width="2279" height="3425" />
    </Item>
    <Item filename="AGC_1164.jpg" iscover="True">
      <Title>Bumper Cars at a Funfair in Bangkok</Title>
      <Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption>
      <Authors>Anthony Bouch</Authors>
      <Copyright>Copyright ? Anthony Bouch</Copyright>
      <CreatedDate>2009-08-03T22:08:24</CreatedDate>
      <Keywords>
        <Keyword>Bumper Cars</Keyword>
        <Keyword>Funfair</Keyword>
        <Keyword>Bangkok</Keyword>
        <Keyword>Thailand</Keyword>
      </Keywords>
      <ThumbnailSize width="200" height="133" />
      <PreviewSize width="800" height="532" />
      <OriginalSize width="3725" height="2479" />
    </Item>
  </Items>
</Collection>

I want all of that information displayed to the screen, that's it. Should be simple right? I am doing this:

我希望所有这些信息都显示在屏幕上,就是这样。应该很简单吧?我正在这样做:

require 'nokogiri'

doc = Nokogiri::XML(File.open("sample.xml"))
@block = doc.css("items item").map {|node| node.children.text}
puts @block

Each Itemsis a node, and under that there are children nodes of Item?

每个Items都是一个节点,在其下有Item?

I create a map of this, which returns a hash, and the code in {}goes through each node and places the children text into @block. Then I can display all of the child node's text to the screen.

我创建了一个映射,它返回一个散列,代码{}遍历每个节点并将子文本放入@block. 然后我可以将所有子节点的文本显示到屏幕上。

I have no idea how far or close I am, because I've read so many articles, and am still a little confused on the basics especially since usually with a new language, I read from a file and output to the screen for a basic program.

我不知道我有多远或多近,因为我读了很多文章,但对基础知识仍然有点困惑,特别是因为通常使用一种新语言,我从文件中读取并输出到屏幕以获得基本的程序。

回答by Arup Rakshit

Here I will try to explain you all the questions/confusions you are having:

在这里,我将尝试向您解释您遇到的所有问题/困惑:

require 'nokogiri'

doc = Nokogiri::XML.parse <<-XML
<Collection version="2.0" id="74j5hc4je3b9">
  <Name>A Funfair in Bangkok</Name>
  <PermaLink>Funfair in Bangkok</PermaLink>
  <PermaLinkIsName>True</PermaLinkIsName>
  <Description>A small funfair near On Nut in Bangkok.</Description>
  <Date>2009-08-03T00:00:00</Date>
  <IsHidden>False</IsHidden>
  <Items>
    <Item filename="AGC_1998.jpg">
      <Title>Funfair in Bangkok</Title>
      <Caption>A small funfair near On Nut in Bangkok.</Caption>
      <Authors>Anthony Bouch</Authors>
      <Copyright>Copyright ? Anthony Bouch</Copyright>
      <CreatedDate>2009-08-07T19:22:08</CreatedDate>
      <Keywords>
        <Keyword>Funfair</Keyword>
        <Keyword>Bangkok</Keyword>
        <Keyword>Thailand</Keyword>
      </Keywords>
      <ThumbnailSize width="133" height="200" />
      <PreviewSize width="532" height="800" />
      <OriginalSize width="2279" height="3425" />
    </Item>
    <Item filename="AGC_1164.jpg" iscover="True">
      <Title>Bumper Cars at a Funfair in Bangkok</Title>
      <Caption>Bumper cars at a small funfair near On Nut in Bangkok.</Caption>
      <Authors>Anthony Bouch</Authors>
      <Copyright>Copyright ? Anthony Bouch</Copyright>
      <CreatedDate>2009-08-03T22:08:24</CreatedDate>
      <Keywords>
        <Keyword>Bumper Cars</Keyword>
        <Keyword>Funfair</Keyword>
        <Keyword>Bangkok</Keyword>
        <Keyword>Thailand</Keyword>
      </Keywords>
      <ThumbnailSize width="200" height="133" />
      <PreviewSize width="800" height="532" />
      <OriginalSize width="3725" height="2479" />
    </Item>
  </Items>
</Collection>
XML


So from my understanding of Nokogiri, each 'Items' is a node, and under that there are children nodes of 'Item'?

所以根据我对 Nokogiri 的理解,每个 'Items' 都是一个节点,在它下面有 'Item' 的子节点?

No, each Itemsare Nokogiri::XML::NodeSet. And under that there are 2 children nodes of Items,which are of Nokogiri::XML::Elementclass object. You can say them also Nokogiri::XML::Node

不,每个项目都是Nokogiri::XML::NodeSet。在其下有Items 的2 个子节点,它们是Nokogiri::XML::Element类对象。你也可以说他们Nokogiri::XML::Node

doc.class # => Nokogiri::XML::Document
@block = doc.xpath("//Items/Item")
@block.class # => Nokogiri::XML::NodeSet
@block.count # => 2
@block.map { |node| node.name }
# => ["Item", "Item"]
@block.map { |node| node.class }
# => [Nokogiri::XML::Element, Nokogiri::XML::Element]
@block.map { |node| node.children.count }
# => [19, 19]
@block.map { |node| node.class.superclass }
# => [Nokogiri::XML::Node, Nokogiri::XML::Node]


We create a map of this, which returns a hash I believe, and the code in {} goes through each node and places the children text into @block. Then I can display all of this child node's text to the screen.

我们为此创建了一个映射,它返回一个我认为的哈希值,{} 中的代码遍历每个节点并将子文本放入 @block 中。然后我可以在屏幕上显示所有这个子节点的文本。

I don't understand this. Although I tried to explain below to show what is Node,and what is Nodesetin Nokogiri. Remember Nodesetis a collection of Nodes.

我不明白这个。虽然我试图下面解释显示什么是节点,什么是节点组引入nokogiri。记住NodesetNodes的集合。

@chld_class = @block.map do |node|
  node.children.class
end
@chld_class
# => [Nokogiri::XML::NodeSet, Nokogiri::XML::NodeSet]
@chld_name = @block.map do |node|
  node.children.map { |n| [n.name,n.class] }
end
@chld_name
# => [[["text", Nokogiri::XML::Text],
#      ["Title", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Caption", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Authors", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Copyright", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["CreatedDate", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Keywords", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["ThumbnailSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["PreviewSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["OriginalSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text]],
#     [["text", Nokogiri::XML::Text],
#      ["Title", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Caption", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Authors", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Copyright", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["CreatedDate", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["Keywords", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["ThumbnailSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["PreviewSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text],
#      ["OriginalSize", Nokogiri::XML::Element],
#      ["text", Nokogiri::XML::Text]]]


@chld_name = @block.map do |node|
  node.children.map{|n| [n.name,n.text.strip] if n.elem? }.compact
end.compact
@chld_name
# => [[["Title", "Funfair in Bangkok"],
#      ["Caption", "A small funfair near On Nut in Bangkok."],
#      ["Authors", "Anthony Bouch"],
#      ["Copyright", "Copyright ? Anthony Bouch"],
#      ["CreatedDate", "2009-08-07T19:22:08"],
#      ["Keywords", "Funfair\n        Bangkok\n        Thailand"],
#      ["ThumbnailSize", ""],
#      ["PreviewSize", ""],
#      ["OriginalSize", ""]],
#     [["Title", "Bumper Cars at a Funfair in Bangkok"],
#      ["Caption", "Bumper cars at a small funfair near On Nut in Bangkok."],
#      ["Authors", "Anthony Bouch"],
#      ["Copyright", "Copyright ? Anthony Bouch"],
#      ["CreatedDate", "2009-08-03T22:08:24"],
#      ["Keywords",
#       "Bumper Cars\n        Funfair\n        Bangkok\n        Thailand"],
#      ["ThumbnailSize", ""],
#      ["PreviewSize", ""],
#      ["OriginalSize", ""]]]

回答by orde

The nodes in the sample XML are capitalized, so your code should reflect that. For example:

示例 XML 中的节点是大写的,因此您的代码应该反映这一点。例如:

require 'nokogiri'

doc = Nokogiri::XML(File.open("sample.xml"))
@block = doc.css("Items Item").map { |node| node.children.text }
puts @block