pandas XML 到 CSV Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49898661/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
XML to CSV Python
提问by Nipun khanna
The XML data(file.xml) for the state will look like below
状态的 XML 数据(file.xml)如下所示
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing
DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cisco.com/PowerKEYDVB/Auditing">
<ActivityRecord>
<time>2015-09-16T04:13:20Z</time>
<oper>Create_Product</oper>
<pkgEid>10</pkgEid>
<pkgName>BBCWRL</pkgName>
</ActivityRecord>
<ActivityRecord>
<time>2015-09-16T04:13:20Z</time>
<oper>Create_Product</oper>
<pkgEid>18</pkgEid>
<pkgName>CNNINT</pkgName>
</ActivityRecord>
Parsing and conversion to CSV of above mentioned XML file will be done by the following python code.
上述 XML 文件的解析和转换为 CSV 将由以下 python 代码完成。
import csv
import xml.etree.cElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
data_to_csv= open('output.csv','w')
list_head=[]
Csv_writer=csv.writer(data_to_csv)
count=0
for elements in root.findall('ActivityRecord'):
List_node = []
if count == 0 :
time = elements.find('time').tag
list_head.append(time)
oper = elements.find('oper').tag
list_head.append(oper)
pkgEid = elements.find('pkgEid').tag
list_head.append(pkgEid)
pkgName = elements.find('pkgName').tag
list_head.append(pkgName)
Csv_writer.writerow(list_head)
count = +1
time = elements.find('time').text
List_node.append(time)
oper = elements.find('oper').text
List_node.append(oper)
pkgEid = elements.find('pkgEid').text
List_node.append(pkgEid)
pkgName = elements.find('pkgName').text
List_node.append(pkgName)
Csv_writer.writerow(List_node)
data_to_csv.close()
The code I am using is not giving me any data in CSV. Could some one tell me where excatly am I going wrong?
我使用的代码没有给我任何 CSV 数据。有人能告诉我我到底哪里出错了吗?
采纳答案by Nipun khanna
Found the most appropriate way of doing this.
import os
import pandas as pd
找到了最合适的方法。
导入 os 导入Pandas作为 pd
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
content = f.read()
soup = b(content, "lxml")
df1=pd.DataFrame()
for each_file in files_xlm:
with open( each_file, "r") as f: # opening xml file
content = f.read()
soup = b(content, "lxml")
list1 = []
for values in soup.findAll("activityrecord"):
if values.find("time") is None:
time = ""
else:
time = values.find("time").text
if values.find("oper") is None:
oper = ""
else:
oper = values.find("oper").text
if values.find("pkgeid") is None:
pkgeid = ""
else:
pkgeid = values.find("pkgeid").text
if values.find("pkgname") is None:
pkgname = ""
else:
pkgname = values.find("pkgname").text
if values.find("dhct") is None:
dhct = ""
else:
dhct = values.find("dhct").text
if values.find("sourceid") is None:
sourceid = ""
else:
sourceid = values.find("sourceid").text
list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
df = pd.DataFrame(list1)
df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)
回答by Willian Vieira
Using Pandas, parsing all xml fields.
使用 Pandas,解析所有 xml 字段。
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file.xml")
root = tree.getroot()
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')
回答by Rachit kapadia
Using pandas
and BeautifulSoup
you can achieve your expected output easily:
使用pandas
并且BeautifulSoup
您可以轻松实现预期的输出:
#Code:
import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
content = f.read()
soup = b(content, "lxml")
pkgeid = [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time = [ values.text for values in soup.findAll("time")]
oper = [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)]
df = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)
#output in `sample.csv` file will be as follows:
2015-09-16T04:13:20Z,Create_Product,10,BBCWRL
2015-09-16T04:13:20Z,Create_Product,18,CNNINT
2018-04-01T03:30:28Z,Deactivate_Dhct,,
回答by Vishnu Kiran
Use pyxmlparser if it is a one-time operation.
如果是一次性操作,请使用 pyxmlparser。
Disclaimer I am the author of the library and it is fairly new. Any feedback is appreciated. It is a command line utility.
免责声明我是图书馆的作者,它是相当新的。任何反馈表示赞赏。它是一个命令行实用程序。