如何将 XML 数据转换为 data.frame?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2067098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to transform XML data into a data.frame?
提问by larus
I'm trying to learn R's XMLpackage. I'm trying to create a data.frame from books.xml sample xml data file. Here's what I get:
我正在尝试学习 R 的XML包。我正在尝试从 books.xml 示例 xml 数据文件创建一个 data.frame。这是我得到的:
library(XML)
books <- "http://www.w3schools.com/XQuery/books.xml"
doc <- xmlTreeParse(books, useInternalNodes = TRUE)
doc
xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x))))
xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " "))
xpathSApply(doc, "//book/child::*", xmlValue)
Each of these xpathSApply's don't get me even close to my intention. How should one proceed toward a well formed data.frame?
这些 xpathSApply 中的每一个都没有让我接近我的意图。应该如何处理格式良好的 data.frame?
回答by Shane
Ordinarily, I would suggest trying the xmlToDataFrame()function, but I believe that this will actually be fairly tricky because it isn't well structured to begin with.
通常,我会建议尝试使用该xmlToDataFrame()功能,但我相信这实际上会相当棘手,因为它一开始就没有很好的结构。
I would recommend working with this function:
我建议使用此功能:
xmlToList(books)
One problem is that there are multiple authors per book, so you will need to decide how to handle that when you're structuring your data frame.
一个问题是每本书有多个作者,因此您需要在构建数据框时决定如何处理。
Once you have decided what to do with the multiple authors issue, then it's fairly straight forward to turn your book list into a data frame with the ldply()function in plyr (or just use lapply and convert the return value into a data.frame by using do.call("rbind"...).
一旦您决定如何处理多作者问题,那么将您的图书列表转换为具有ldply()plyr 中的函数的数据框(或仅使用 lapply 并使用 do 将返回值转换为 data.frame 是相当直接的) .call("rbind"...)。
Here's a complete example (excluding author):
这是一个完整的示例(不包括作者):
library(XML)
books <- "w3schools.com/xsl/books.xml"
library(plyr)
ldply(xmlToList(books), function(x) { data.frame(x[!names(x)=="author"]) } )
.id title.text title..attrs year price .attrs
1 book Everyday Italian en 2005 30.00 COOKING
2 book Harry Potter en 2005 29.99 CHILDREN
3 book XQuery Kick Start en 2003 49.99 WEB
4 book Learning XML en 2003 39.95 WEB
Here's what it looks like with author included. You need to use ldplyin this instance since the list is "jagged"...lapply can't handle that properly. [Otherwise you can use lapplywith rbind.fill(also courtesy of Hadley), but why bother when plyrautomatically does it for you?]:
这是包含作者的情况。您需要ldply在这种情况下使用,因为列表是“锯齿状的”...... lapply 无法正确处理。[否则你可以使用lapplywith rbind.fill(也是由 Hadley 提供的),但是为什么要在plyr自动为你做的时候费心呢?]:
ldply(xmlToList(books), data.frame)
.id title.text title..attrs author year price .attrs
1 book Everyday Italian en Giada De Laurentiis 2005 30.00 COOKING
2 book Harry Potter en J K. Rowling 2005 29.99 CHILDREN
3 book XQuery Kick Start en James McGovern 2003 49.99 WEB
4 book Learning XML en Erik T. Ray 2003 39.95 WEB
author.1 author.2 author.3 author.4
1 <NA> <NA> <NA> <NA>
2 <NA> <NA> <NA> <NA>
3 Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan
4 <NA> <NA> <NA> <NA>

