如何从 xml 文件创建 R 数据框

Question

提问by POTENZA

I have a XML Document file. The part of the file looks like this:

我有一个 XML 文档文件。该文件的部分如下所示：

-<attr>  
     <attrlabl>COUNTY</attrlabl>  
     <attrdef>County abbreviation</attrdef>  
     <attrtype>Text</attrtype>  
     <attwidth>1</attwidth>  
     <atnumdec>0</atnumdec>  
    -<attrdomv>  
        -<edom>  
            <edomv>C</edomv>  
            <edomvd>Clackamas County</edomvd>  
            <edomvds/>  
         </edom>  
        -<edom>  
            <edomv>M</edomv>  
            <edomvd>Multnomah County</edomvd>  
            <edomvds/>  
         </edom>  
        -<edom>  
            <edomv>W</edomv>  
            <edomvd>Washington County</edomvd>  
            <edomvds/>  
         </edom>  
     </attrdomv>  
 </attr>

From this XML file, I want to create an R data frame with the columns of attrlabl, attrdef, attrtype, and attrdomv. Please note that the attrdomv column should include all of the levels for the category variable. The data frame should look like this:

从这个 XML 文件中，我想创建一个包含 attrlabl、attrdef、attrtype 和 attrdomv 列的 R 数据框。请注意 attrdomv 列应包括类别变量的所有级别。数据框应如下所示：

attrlabl    attrdef                attrtype    attrdomv  
COUNTY      County abbreviation    Text        C Clackamas County; M Multnomah County; W Washington County

I have an incomplete code like this:

我有这样一个不完整的代码：

doc <- xmlParse("taxlots.shp.xml")  
dataDictionary <- xmlToDataFrame(getNodeSet(doc,"//attrlabl"))

Could you please complete my R code? I appreciate any help!

你能完成我的R代码吗？我感谢任何帮助！

Answer 1

回答by plannapus

Assuming this is the correct taxlots.shp.xmlfile:

假设这是正确的taxlots.shp.xml文件：

<attr>  
     <attrlabl>COUNTY</attrlabl>  
     <attrdef>County abbreviation</attrdef>  
     <attrtype>Text</attrtype>  
     <attwidth>1</attwidth>  
     <atnumdec>0</atnumdec>  
    <attrdomv>  
        <edom>  
            <edomv>C</edomv>  
            <edomvd>Clackamas County</edomvd>  
            <edomvds/>  
         </edom>  
        <edom>  
            <edomv>M</edomv>  
            <edomvd>Multnomah County</edomvd>  
            <edomvds/>  
         </edom>  
        <edom>  
            <edomv>W</edomv>  
            <edomvd>Washington County</edomvd>  
            <edomvds/>  
         </edom>  
     </attrdomv>  
 </attr>

You were almost there:

你快到了：

doc <- xmlParse("taxlots.shp.xml")
xmlToDataFrame(nodes=getNodeSet(doc1,"//attr"))[c("attrlabl","attrdef","attrtype","attrdomv")]
  attrlabl             attrdef attrtype                                             attrdomv
1   COUNTY County abbreviation     Text CClackamas CountyMMultnomah CountyWWashington County

But the last field has not the format you wanted. To do so, require some additional steps:

但是最后一个字段不是您想要的格式。为此，需要一些额外的步骤：

step1 <- xmlToDataFrame(nodes=getNodeSet(doc1,"//attrdomv/edom"))
step1
  edomv            edomvd edomvds
1     C  Clackamas County        
2     M  Multnomah County        
3     W Washington County  

step2 <- paste(paste(step1$edomv, step1$edomvd, sep=" "), collapse="; ")
step2
[1] "C Clackamas County; M Multnomah County; W Washington County"

cbind(xmlToDataFrame(nodes= getNodeSet(doc1, "//attr"))[c("attrlabl", "attrdef", "attrtype")],
      attrdomv= step2)
  attrlabl             attrdef attrtype                                                      attrdomv
1   COUNTY County abbreviation     Text C Clackamas County; M Multnomah County; W Washington County

如何从 xml 文件创建 R 数据框

提问by POTENZA

回答by plannapus

相关推荐

最近更新

标签

如何从 xml 文件创建 R 数据框

提问by POTENZA

回答by plannapus

相关推荐

XML Schema - 字符串列表

如何在 XML 中转义“&”？

xml 根据 XSLT 中的子节点值选择节点

从 ASP.NET Web API 中删除 XML 中的命名空间

相关推荐

最近更新

标签