pandas XML 到 CSV Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49898661/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:29:09  来源:igfitidea点击:

XML to CSV Python

python-3.xpandascsvbeautifulsoupxml.etree

提问by Nipun khanna

The XML data(file.xml) for the state will look like below

状态的 XML 数据(file.xml)如下所示

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing 
DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cisco.com/PowerKEYDVB/Auditing">
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>10</pkgEid>
       <pkgName>BBCWRL</pkgName>
       </ActivityRecord>
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>18</pkgEid>
       <pkgName>CNNINT</pkgName>
    </ActivityRecord>

Parsing and conversion to CSV of above mentioned XML file will be done by the following python code.

上述 XML 文件的解析和转换为 CSV 将由以下 python 代码完成。

import csv
import xml.etree.cElementTree as ET


tree =  ET.parse('file.xml')
root = tree.getroot()


data_to_csv= open('output.csv','w')

list_head=[]

Csv_writer=csv.writer(data_to_csv)

count=0
for elements in root.findall('ActivityRecord'):
    List_node = []
    if count == 0 :

        time = elements.find('time').tag
        list_head.append(time)

        oper = elements.find('oper').tag
        list_head.append(oper)

        pkgEid = elements.find('pkgEid').tag
        list_head.append(pkgEid)


        pkgName = elements.find('pkgName').tag
        list_head.append(pkgName)

        Csv_writer.writerow(list_head)
        count = +1

    time = elements.find('time').text
    List_node.append(time)

    oper = elements.find('oper').text
    List_node.append(oper)

    pkgEid = elements.find('pkgEid').text
    List_node.append(pkgEid)

    pkgName = elements.find('pkgName').text
    List_node.append(pkgName)    

    Csv_writer.writerow(List_node)

data_to_csv.close()

The code I am using is not giving me any data in CSV. Could some one tell me where excatly am I going wrong?

我使用的代码没有给我任何 CSV 数据。有人能告诉我我到底哪里出错了吗?

采纳答案by Nipun khanna

Found the most appropriate way of doing this.
import os import pandas as pd

找到了最合适的方法。
导入 os 导入Pandas作为 pd

from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
 content = f.read()

soup = b(content, "lxml")
df1=pd.DataFrame()
for each_file in files_xlm: 
with open( each_file, "r") as f: # opening xml file
    content = f.read()
soup = b(content, "lxml")    

list1 = []
for values in soup.findAll("activityrecord"):  
    if values.find("time") is None:
        time = ""
    else:
        time = values.find("time").text        
    if values.find("oper") is None:
        oper = ""    
    else:
        oper = values.find("oper").text      
    if values.find("pkgeid") is None:
        pkgeid = ""    
    else:
        pkgeid = values.find("pkgeid").text     
    if values.find("pkgname") is None:
        pkgname = ""    
    else:
        pkgname = values.find("pkgname").text 
    if values.find("dhct") is None:
        dhct = ""    
    else:
        dhct = values.find("dhct").text   
    if values.find("sourceid") is None:
        sourceid = ""    
    else:
        sourceid = values.find("sourceid").text      

    list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
    df  = pd.DataFrame(list1)



df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)

回答by Willian Vieira

Using Pandas, parsing all xml fields.

使用 Pandas,解析所有 xml 字段。

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]

df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')

回答by Rachit kapadia

Using pandasand BeautifulSoupyou can achieve your expected output easily:

使用pandas并且BeautifulSoup您可以轻松实现预期的输出:

#Code:

import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
pkgeid =  [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time =  [ values.text for values in soup.findAll("time")]
oper =  [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)] 
df  = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)


#output in `sample.csv` file will be as follows:
2015-09-16T04:13:20Z,Create_Product,10,BBCWRL
2015-09-16T04:13:20Z,Create_Product,18,CNNINT
2018-04-01T03:30:28Z,Deactivate_Dhct,,

回答by Vishnu Kiran

Use pyxmlparser if it is a one-time operation.

如果是一次性操作,请使用 pyxmlparser。

Disclaimer I am the author of the library and it is fairly new. Any feedback is appreciated. It is a command line utility.

免责声明我是图书馆的作者,它是相当新的。任何反馈表示赞赏。它是一个命令行实用程序。

https://pypi.org/project/pyxmlparser/

https://pypi.org/project/pyxmlparser/