Python pandas.DataFrame.from_dict 不使用 OrderedDict 保留顺序

Question

提问by dkapitan

I want to import OData XML datafeeds from the Dutch Bureau of Statistics (CBS) into our database. Using lxml and pandas I thought this should be straigtforward. By using OrderDict I want to preserve the order of the columns for readability, but somehow I can't get it right.

我想将来自荷兰统计局 (CBS) 的 OData XML 数据馈送导入我们的数据库。使用 lxml 和 pandas 我认为这应该是直截了当的。通过使用 OrderDict 我想保留列的顺序以提高可读性，但不知何故我无法做到正确。

from collections import OrderedDict
from lxml import etree
import requests
import pandas as pd


# CBS URLs
base_url = 'http://opendata.cbs.nl/ODataFeed/odata'
datasets = ['/37296ned', '/82245NED']

feed = requests.get(base_url + datasets[1] + '/TypedDataSet')
root = etree.fromstring(feed.content)

# all record entries start at tag m:properties, parse into data dict
data = []
for record in root.iter('{{{}}}properties'.format(root.nsmap['m'])):
    row = OrderedDict()
    for element in record:
        row[element.tag.split('}')[1]] = element.text
    data.append(row)

df = pd.DataFrame.from_dict(data)
df.columns

Inspecting data, the OrderDict is in the right order. But looking at df.head()the columns have been sorted alphabetically with CAPS first?

检查data，OrderDict 的顺序是正确的。但是查看df.head()已按字母顺序排列的列，先使用大写字母？

Help, anyone?

帮助，有人吗？

Answer 1

采纳答案by chris-sc

Something in your example seems to be inconsistent, as datais a listand no dict, but assuming you really have an OrderedDict:

您的示例中的某些内容似乎不一致，dataalist和 no 也是如此dict，但假设您确实有一个OrderedDict：

Try to explicitly specify your column order when you create your DataFrame:

创建 DataFrame 时，尝试明确指定列顺序：

# ... all your data collection
df = pd.DataFrame(data, columns=data.keys())

This should give you your DataFrame with the columns ordered just in exact the way they are in the OrderedDict (via the data.keys()generated list)

这应该为您的 DataFrame 提供与它们在 OrderedDict 中完全相同的列排序方式（通过data.keys()生成的列表）

Answer 2

回答by Daniel Wu

The above answer doesn't work for me and keep giving me "ValueError: cannot use columns parameter with orient='columns'".

上面的答案对我不起作用，并不断给我“ValueError: cannot use columns parameter with orient='columns'”。

Later I found a solution by doing this below and worked:

后来我通过在下面执行此操作找到了解决方案并工作：

df = pd.DataFrame.from_dict (dict_data) [list (dict_data[0].keys())]

Python pandas.DataFrame.from_dict 不使用 OrderedDict 保留顺序

提问by dkapitan

采纳答案by chris-sc

回答by Daniel Wu

相关推荐

最近更新

标签

Python pandas.DataFrame.from_dict 不使用 OrderedDict 保留顺序

提问by dkapitan

采纳答案by chris-sc

回答by Daniel Wu

相关推荐

Python TensorFlow - 类似 numpy 的张量索引

Python TypeError：不支持的操作数类型/：'str'和'float'

Python matplotlib 颜色条的顶部标签

从python中的excelsheet读取特定的单元格值

相关推荐

最近更新

标签