pandas XML 到 CSV Python

Question

提问by Nipun khanna

The XML data(file.xml) for the state will look like below

状态的 XML 数据（file.xml）如下所示

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing 
DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cisco.com/PowerKEYDVB/Auditing">
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>10</pkgEid>
       <pkgName>BBCWRL</pkgName>
       </ActivityRecord>
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>18</pkgEid>
       <pkgName>CNNINT</pkgName>
    </ActivityRecord>

Parsing and conversion to CSV of above mentioned XML file will be done by the following python code.

上述 XML 文件的解析和转换为 CSV 将由以下 python 代码完成。

import csv
import xml.etree.cElementTree as ET


tree =  ET.parse('file.xml')
root = tree.getroot()


data_to_csv= open('output.csv','w')

list_head=[]

Csv_writer=csv.writer(data_to_csv)

count=0
for elements in root.findall('ActivityRecord'):
    List_node = []
    if count == 0 :

        time = elements.find('time').tag
        list_head.append(time)

        oper = elements.find('oper').tag
        list_head.append(oper)

        pkgEid = elements.find('pkgEid').tag
        list_head.append(pkgEid)


        pkgName = elements.find('pkgName').tag
        list_head.append(pkgName)

        Csv_writer.writerow(list_head)
        count = +1

    time = elements.find('time').text
    List_node.append(time)

    oper = elements.find('oper').text
    List_node.append(oper)

    pkgEid = elements.find('pkgEid').text
    List_node.append(pkgEid)

    pkgName = elements.find('pkgName').text
    List_node.append(pkgName)    

    Csv_writer.writerow(List_node)

data_to_csv.close()

The code I am using is not giving me any data in CSV. Could some one tell me where excatly am I going wrong?

我使用的代码没有给我任何 CSV 数据。有人能告诉我我到底哪里出错了吗？

Answer 1

采纳答案by Nipun khanna

Found the most appropriate way of doing this.
import os import pandas as pd

找到了最合适的方法。
导入 os 导入Pandas作为 pd

from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
 content = f.read()

soup = b(content, "lxml")
df1=pd.DataFrame()
for each_file in files_xlm: 
with open( each_file, "r") as f: # opening xml file
    content = f.read()
soup = b(content, "lxml")    

list1 = []
for values in soup.findAll("activityrecord"):  
    if values.find("time") is None:
        time = ""
    else:
        time = values.find("time").text        
    if values.find("oper") is None:
        oper = ""    
    else:
        oper = values.find("oper").text      
    if values.find("pkgeid") is None:
        pkgeid = ""    
    else:
        pkgeid = values.find("pkgeid").text     
    if values.find("pkgname") is None:
        pkgname = ""    
    else:
        pkgname = values.find("pkgname").text 
    if values.find("dhct") is None:
        dhct = ""    
    else:
        dhct = values.find("dhct").text   
    if values.find("sourceid") is None:
        sourceid = ""    
    else:
        sourceid = values.find("sourceid").text      

    list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
    df  = pd.DataFrame(list1)



df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)

Answer 2

回答by Willian Vieira

Using Pandas, parsing all xml fields.

使用 Pandas，解析所有 xml 字段。

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]

df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')

Answer 3

回答by Rachit kapadia

Using pandasand BeautifulSoupyou can achieve your expected output easily:

使用pandas并且BeautifulSoup您可以轻松实现预期的输出：

#Code:

import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
pkgeid =  [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time =  [ values.text for values in soup.findAll("time")]
oper =  [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)] 
df  = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)

#output in `sample.csv` file will be as follows:
2015-09-16T04:13:20Z,Create_Product,10,BBCWRL
2015-09-16T04:13:20Z,Create_Product,18,CNNINT
2018-04-01T03:30:28Z,Deactivate_Dhct,,

Answer 4

回答by Vishnu Kiran

Use pyxmlparser if it is a one-time operation.

如果是一次性操作，请使用 pyxmlparser。

Disclaimer I am the author of the library and it is fairly new. Any feedback is appreciated. It is a command line utility.

免责声明我是图书馆的作者，它是相当新的。任何反馈表示赞赏。它是一个命令行实用程序。

https://pypi.org/project/pyxmlparser/

pandas XML 到 CSV Python

提问by Nipun khanna

采纳答案by Nipun khanna

回答by Willian Vieira

回答by Rachit kapadia

回答by Vishnu Kiran

相关推荐

最近更新

标签

pandas XML 到 CSV Python

提问by Nipun khanna

采纳答案by Nipun khanna

回答by Willian Vieira

回答by Rachit kapadia

回答by Vishnu Kiran

相关推荐

Pip install pandas：安装依赖错误

pandas 熊猫箱线图作为具有单独 y 轴的子图

pandas 熊猫“时间戳”对象不可下标

pandas 熊猫 dropna() 功能不起作用

相关推荐

最近更新

标签