将 XML 文件读取到 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52968877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read XML file to Pandas DataFrame
提问by Sarthak Girdhar
Can someone please help convert the following XML file to Pandas dataframe:
有人可以帮助将以下 XML 文件转换为 Pandas 数据框:
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<bathrooms type="dict">
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</bathrooms>
<price type="dict">
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id type="dict">
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
It should look like this --
它应该是这样的——
This is the code I have written:-
这是我写的代码:-
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('real_state.xml')
root = tree.getroot()
dfcols = ['property_id', 'price', 'bathrooms']
df_xml = pd.DataFrame(columns=dfcols)
for node in root:
property_id = node.attrib.get('property_id')
price = node.attrib.get('price')
bathrooms = node.attrib.get('bathrooms')
df_xml = df_xml.append(
pd.Series([property_id, price, bathrooms], index=dfcols),
ignore_index=True)
print(df_xml)
I am getting Noneeverywhere, instead of the actual values. Can someone please tell how it can be fixed. Thanks!
我到处都没有,而不是实际值。有人可以告诉它如何修复。谢谢!
采纳答案by Yo_Chris
if the data is simple, like this, then you can do something like:
如果数据很简单,就像这样,那么您可以执行以下操作:
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
bathrooms = [child.text for child in root['bathrooms'].getchildren()]
price = [child.text for child in root['price'].getchildren()]
property_id = [child.text for child in root['property_id'].getchildren()]
data = [bathrooms, price, property_id]
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
bathrooms price property_id
0 1.0 7020000.0 35237.0
1 3.0 10000000.0 32238.0
2 nan 4128000.0 44699.0
if it is more complex then a loop is better. You can do something like
如果它更复杂,那么循环更好。你可以做类似的事情
from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']
回答by Correy Koshnick
I have had success using this function from the xmltodictpackage:
我已经成功地使用了xmltodict包中的这个函数:
import xmltodict
xmlDict = xmltodict.parse(xmlData)
df = pd.DataFrame.from_dict(xmlDict)
What I like about this, is I can easily do some dictionary manipulation in between parsing the xml and making my df. Also, it helps to explore the data as a dict if the structure is wily.
我喜欢这个,是我可以在解析 xml 和制作我的 df 之间轻松地进行一些字典操作。此外,如果结构巧妙,它有助于将数据作为字典来探索。