将 XML 文件读取到 Pandas DataFrame

Question

提问by Sarthak Girdhar

Can someone please help convert the following XML file to Pandas dataframe:

有人可以帮助将以下 XML 文件转换为 Pandas 数据框：

<?xml version="1.0" encoding="UTF-8" ?>
<root>
 <bathrooms type="dict">
  <n35237 type="number">1.0</n35237>
  <n32238 type="number">3.0</n32238>
  <n44699 type="number">nan</n44699>
 </bathrooms>
 <price type="dict">
  <n35237 type="number">7020000.0</n35237>
  <n32238 type="number">10000000.0</n32238>
  <n44699 type="number">4128000.0</n44699>
 </price>
 <property_id type="dict">
  <n35237 type="number">35237.0</n35237>
  <n32238 type="number">32238.0</n32238>
  <n44699 type="number">44699.0</n44699>
 </property_id>
</root>

It should look like this --

它应该是这样的——

OUTPUT

输出

This is the code I have written:-

这是我写的代码：-

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('real_state.xml')
root = tree.getroot()

dfcols = ['property_id', 'price', 'bathrooms']
df_xml = pd.DataFrame(columns=dfcols)

for node in root:
    property_id = node.attrib.get('property_id')
    price = node.attrib.get('price')
    bathrooms = node.attrib.get('bathrooms')

    df_xml = df_xml.append(
            pd.Series([property_id, price, bathrooms], index=dfcols),
            ignore_index=True)


print(df_xml)

I am getting Noneeverywhere, instead of the actual values. Can someone please tell how it can be fixed. Thanks!

我到处都没有，而不是实际值。有人可以告诉它如何修复。谢谢！

Answer 1

采纳答案by Yo_Chris

if the data is simple, like this, then you can do something like:

如果数据很简单，就像这样，那么您可以执行以下操作：

from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()

bathrooms = [child.text for child in root['bathrooms'].getchildren()]
price = [child.text for child in root['price'].getchildren()]
property_id = [child.text for child in root['property_id'].getchildren()]

data = [bathrooms, price, property_id]
df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']

    bathrooms   price      property_id
0   1.0        7020000.0    35237.0
1   3.0        10000000.0   32238.0
2   nan        4128000.0    44699.0

if it is more complex then a loop is better. You can do something like

如果它更复杂，那么循环更好。你可以做类似的事情

from lxml import objectify
xml = objectify.parse('Document1.xml')
root = xml.getroot()

data=[]
for i in range(len(root.getchildren())):
    data.append([child.text for child in root.getchildren()[i].getchildren()])

df = pd.DataFrame(data).T
df.columns = ['bathrooms', 'price', 'property_id']

Answer 2

回答by Correy Koshnick

I have had success using this function from the xmltodictpackage:

我已经成功地使用了xmltodict包中的这个函数：

import xmltodict

xmlDict = xmltodict.parse(xmlData)
df = pd.DataFrame.from_dict(xmlDict)

What I like about this, is I can easily do some dictionary manipulation in between parsing the xml and making my df. Also, it helps to explore the data as a dict if the structure is wily.

我喜欢这个，是我可以在解析 xml 和制作我的 df 之间轻松地进行一些字典操作。此外，如果结构巧妙，它有助于将数据作为字典来探索。

将 XML 文件读取到 Pandas DataFrame

提问by Sarthak Girdhar

采纳答案by Yo_Chris

回答by Correy Koshnick

相关推荐

最近更新

标签

将 XML 文件读取到 Pandas DataFrame

提问by Sarthak Girdhar

采纳答案by Yo_Chris

回答by Correy Koshnick

相关推荐

pandas 如何将熊猫列的值设置为列表

pandas ValueError: view limit minimum -5.1000000000000005 小于 1 并且是无效的 Matplotlib 日期值

使用 pandas.DataFrame.to_html 时如何设置列宽？

pandas FutureWarning：不推荐使用非元组序列进行多维索引，使用 `arr[tuple(seq)]`

相关推荐

最近更新

标签