使用 Ruby 解析 XML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11198239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 05:14:10  来源:igfitidea点击:

Parsing XML with Ruby

rubyxml-parsing

提问by n8gard

I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.

我对使用 XML 还很陌生,但刚好有一个需要放在我的腿上。我得到了一个通常的(对我来说)XML 格式。标签中有冒号。

<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>

It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?

这是一个很大的文件,还有更多的内容,但我希望有人会熟悉这种格式。有人知道处理这种 XML 文档的方法吗?

I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.

我宁愿不只是编写解析文本的蛮力方式,但我似乎无法在 REXML 或 Hpricot 上取得任何进展,我怀疑这是由于这些不寻常的标签。

my ruby code:

我的红宝石代码:

    require 'hpricot'
    xml = File.open( "myfile.xml" )

    doc = Hpricot::XML( xml )

   (doc/:things).each do |thg|
     [ 'Id', 'Name' ].each do |el|
       puts "#{el}: #{thg.at(el).innerHTML}"
     end
   end

...which is just lifted from: http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/

...这是刚刚解除:http: //railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/

And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.

我想我可以从这里找出一些东西,但这段代码什么都不返回。它不会出错。它只是返回。

回答by jmdeldin

As @pguardiario mentioned, Nokogiriis the de facto XML and HTML parsing library. If you wanted to print out the Idand Namevalues in your example, here is how you would do it:

正如@pguardiario 所提到的,Nokogiri是事实上的 XML 和 HTML 解析库。如果您想打印示例中的IdName值,您可以这样做:

require 'nokogiri'

xml_str = <<EOF
<THING1:things type="Container">
  <PART1:Id type="Property">1234</PART1:Id>
  <PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF

doc = Nokogiri::XML(xml_str)

thing = doc.at_xpath('//things')
puts "ID   = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content

A few notes:

一些注意事项:

  • at_xpathis for matching one thing. If you know you have multiple items, you want to use xpathinstead.
  • Depending on your document, namespaces can be problematic, so calling doc.remove_namespaces!can help (see this answerfor a brief discussion).
  • You can use the cssmethods instead of xpathif you're more comfortable with those.
  • Definitely play around with this in irbor pryto investigate methods.
  • at_xpath是为了匹配一件事。如果你知道你有多个项目,你想xpath改用。
  • 根据您的文档,命名空间可能有问题,因此调用doc.remove_namespaces!可以提供帮助(请参阅此答案以进行简要讨论)。
  • 如果您更喜欢这些css方法,xpath则可以使用这些方法。
  • 一定要玩这个irbpry调查方法。

Resources

资源

Update

更新

To handle multiple items, you need a root element, and you need to remove the //in the xpathquery.

为了处理多个项目,你需要一个根元素,你需要删除//xpath查询。

require 'nokogiri'

xml_str = <<EOF
<root>
  <THING1:things type="Container">
    <PART1:Id type="Property">1234</PART1:Id>
    <PART1:Name type="Property">The Name1</PART1:Name>
  </THING1:things>
  <THING2:things type="Container">
    <PART2:Id type="Property">2234</PART2:Id>
    <PART2:Name type="Property">The Name2</PART2:Name>
  </THING2:things>
</root>
EOF

doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
  puts "ID   = " + thing.at_xpath('Id').content
  puts "Name = " + thing.at_xpath('Name').content
end

This will give you:

这会给你:

Id   = 1234
Name = The Name1

ID   = 2234
Name = The Name2

If you are more familiar with CSS selectors, you can use this nearly identical bit of code:

如果您更熟悉 CSS 选择器,则可以使用以下几乎相同的代码:

doc.css('things').each do |thing|
  puts "ID   = " + thing.at_css('Id').content
  puts "Name = " + thing.at_css('Name').content
end

回答by IliasT

If in a Rails environment, the Hashobject is extended and one can take advantage of the the method from_xml:

如果在 Rails 环境中,Hash对象被扩展并且可以利用该方法from_xml

xml = File.open("myfile.xml")
data = Hash.from_xml(xml)