Python从xml中提取数据并保存到excel
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27846942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python extract data from xml and save it to excel
提问by Olga K.
I would like to extract some data from an XML file and save it in a table format, such as XLS or DBF.
我想从 XML 文件中提取一些数据并将其保存为表格格式,例如 XLS 或 DBF。
Here is XML file i have:
这是我拥有的 XML 文件:
<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header />
<SOAP-ENV:Body>
<ADD_LandIndex_001>
<CNTROLAREA>
<BSR>
<VERB>ADD</VERB>
<NOUN>LandIndex</NOUN>
<REVISION>001</REVISION>
</BSR>
</CNTROLAREA>
<DATAAREA>
<LandIndex>
<reportId>AMI100031</reportId>
<requestKey>R3278458</requestKey>
<SubmittedBy>EN4871</SubmittedBy>
<submittedOn>2015/01/06 4:20:11 PM</submittedOn>
<LandIndex>
<agreementdetail>
<agreementid>001 4860</agreementid>
<agreementtype>NATURAL GAS</agreementtype>
<currentstatus>
<status>ACTIVE</status>
<statuseffectivedate>1965/02/18</statuseffectivedate>
<termdate>1965/02/18</termdate>
</currentstatus>
<designatedrepresentative>
</designatedrepresentative>
</agreementdetail>
</LandIndex>
</LandIndex>
</DATAAREA>
</ADD_LandIndex_001>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I am interested in information inside the agreementdetailtag which is under DATAAREA/LandIndex/LandIndex/
我对DATAAREA/LandIndex/LandIndex/ 下的协议详细信息标签内的信息感兴趣
UPDATE:
更新:
Thanks to MattDMothis task has moved a bit from its dead point. So I made this script below. It iterates the file and gets all instances of the agreementdetailand outputs agreementidand agreementtypefor each.
多亏了MattDMo,这个任务已经从它的死点移动了一点。所以我在下面制作了这个脚本。它迭代文件并获取协议详细信息的所有实例,并为每个实例输出协议 ID和协议类型。
import xml.etree.ElementTree as ET
import arcpy
xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.findall(".//agreementdetail")
result = []
elements = ('agreementid', 'agreementtype')
for a in agreement:
obj = {}
for e in elements:
obj[e] = a.find(e).text
result.append(obj)
arcpy.AddMessage(result)
The output I am receiving consists of a bunch of this strings: {'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}
我收到的输出包含一堆这样的字符串:{'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}
Now I need to convert this output into a table format (.csv, .dbf, .xls etc.) so that agreementid and agreementtype are columns:
现在我需要将此输出转换为表格格式(.csv、.dbf、.xls 等),以便协议 ID 和协议类型是列:
agreementid | agreementtype
001 4860 | NATURAL GAS
I will be very grateful if you could guide me on how to accomplish it. Or maybe any example?
如果您能指导我如何完成它,我将不胜感激。或者任何例子?
P.S. Python version is 2.7
PS Python 版本是 2.7
采纳答案by MattDMo
The following should work:
以下应该工作:
import xml.etree.ElementTree as ET
import arcpy
xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.find(".//agreementid").text
arcpy.AddMessage(agreement)
The root.find()
call uses an XPathexpression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid
. If there are multiple tags named that in your file, you can use root.findall()
and iterate over the results. If, for example, there are three fields named agreementid
, and you know you want the second one, then root.findall(".//agreementid")[1]
should work.
该root.find()
调用使用XPath表达式(此处的 Python 文档中提供了快速备忘单)来查找名为agreementid
. 如果您的文件中有多个命名为该标签的标签,您可以使用root.findall()
并迭代结果。例如,如果有三个名为 的字段agreementid
,并且您知道您想要第二个,那么root.findall(".//agreementid")[1]
应该可以工作。
回答by Mayur Patel
MattDMo has given a sufficient answer to the problem, but I just want to remind you that python has a csv modulewhich makes it easier to write comma separated data, which is typically then read into applications such as databases or spreadsheets.
MattDMo 已经对这个问题给出了足够的答案,但我只是想提醒你,python 有一个csv 模块,它可以更容易地编写逗号分隔的数据,然后通常将这些数据读入数据库或电子表格等应用程序中。
From the docs:
从文档:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])