使用 Ruby 解析 XML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11198239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing XML with Ruby
提问by n8gard
I'm way new to working with XML but just had a need dropped in my lap. I have been given an usual (to me) XML format. There are colons within the tags.
我对使用 XML 还很陌生,但刚好有一个需要放在我的腿上。我得到了一个通常的(对我来说)XML 格式。标签中有冒号。
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
It is a large file and there is much more to it than this but I hope this format will be familiar to someone. Does anyone know a way to approach an XML document of this sort?
这是一个很大的文件,还有更多的内容,但我希望有人会熟悉这种格式。有人知道处理这种 XML 文档的方法吗?
I'd rather not just write a brute-force way of parsing the text but I can't seem to make any headway with REXML or Hpricot and I suspect it is due to these unusual tags.
我宁愿不只是编写解析文本的蛮力方式,但我似乎无法在 REXML 或 Hpricot 上取得任何进展,我怀疑这是由于这些不寻常的标签。
my ruby code:
我的红宝石代码:
require 'hpricot'
xml = File.open( "myfile.xml" )
doc = Hpricot::XML( xml )
(doc/:things).each do |thg|
[ 'Id', 'Name' ].each do |el|
puts "#{el}: #{thg.at(el).innerHTML}"
end
end
...which is just lifted from: http://railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
...这是刚刚解除:http: //railstips.org/blog/archives/2006/12/09/parsing-xml-with-hpricot/
And I figured I would be able to figure some stuff out from here but this code returns nothing. It doens't error. It just returns.
我想我可以从这里找出一些东西,但这段代码什么都不返回。它不会出错。它只是返回。
回答by jmdeldin
As @pguardiario mentioned, Nokogiriis the de facto XML and HTML parsing library. If you wanted to print out the Idand Namevalues in your example, here is how you would do it:
正如@pguardiario 所提到的,Nokogiri是事实上的 XML 和 HTML 解析库。如果您想打印示例中的Id和Name值,您可以这样做:
require 'nokogiri'
xml_str = <<EOF
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name</PART1:Name>
</THING1:things>
EOF
doc = Nokogiri::XML(xml_str)
thing = doc.at_xpath('//things')
puts "ID = " + thing.at_xpath('//Id').content
puts "Name = " + thing.at_xpath('//Name').content
A few notes:
一些注意事项:
at_xpathis for matching one thing. If you know you have multiple items, you want to usexpathinstead.- Depending on your document, namespaces can be problematic, so calling
doc.remove_namespaces!can help (see this answerfor a brief discussion). - You can use the
cssmethods instead ofxpathif you're more comfortable with those. - Definitely play around with this in
irborpryto investigate methods.
at_xpath是为了匹配一件事。如果你知道你有多个项目,你想xpath改用。- 根据您的文档,命名空间可能有问题,因此调用
doc.remove_namespaces!可以提供帮助(请参阅此答案以进行简要讨论)。 - 如果您更喜欢这些
css方法,xpath则可以使用这些方法。 - 一定要玩这个
irb或pry调查方法。
Resources
资源
Update
更新
To handle multiple items, you need a root element, and you need to remove the //in the xpathquery.
为了处理多个项目,你需要一个根元素,你需要删除//的xpath查询。
require 'nokogiri'
xml_str = <<EOF
<root>
<THING1:things type="Container">
<PART1:Id type="Property">1234</PART1:Id>
<PART1:Name type="Property">The Name1</PART1:Name>
</THING1:things>
<THING2:things type="Container">
<PART2:Id type="Property">2234</PART2:Id>
<PART2:Name type="Property">The Name2</PART2:Name>
</THING2:things>
</root>
EOF
doc = Nokogiri::XML(xml_str)
doc.xpath('//things').each do |thing|
puts "ID = " + thing.at_xpath('Id').content
puts "Name = " + thing.at_xpath('Name').content
end
This will give you:
这会给你:
Id = 1234
Name = The Name1
ID = 2234
Name = The Name2
If you are more familiar with CSS selectors, you can use this nearly identical bit of code:
如果您更熟悉 CSS 选择器,则可以使用以下几乎相同的代码:
doc.css('things').each do |thing|
puts "ID = " + thing.at_css('Id').content
puts "Name = " + thing.at_css('Name').content
end
回答by IliasT
If in a Rails environment, the Hashobject is extended and one can take advantage of the the method from_xml:
如果在 Rails 环境中,Hash对象被扩展并且可以利用该方法from_xml:
xml = File.open("myfile.xml")
data = Hash.from_xml(xml)

