将 Pandas DataFrame 转换为嵌套的 dict
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19798112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas DataFrame to a nested dict
提问by haki
I'm Looking for a generic way of turning a DataFrame to a nested dictionary
我正在寻找一种将 DataFrame 转换为嵌套字典的通用方法
This is a sample data frame
这是一个示例数据框
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
The number of columns may differ and so does the column names.
列数可能不同,列名也可能不同。
like this :
像这样 :
{
'A' : {
'A1' : { 'A11' : 1 }
'A2' : { 'A12' : 2 , 'A21' : 6 }} ,
'B' : {
'B1' : { 'B12' : 3 } } ,
'C' : {
'C1' : { 'C11' : 4}}
}
What is best way to achieve this ?
实现这一目标的最佳方法是什么?
closest I got was with the zipfunction but haven't managed to make it work for more then one level (two columns).
我得到的最接近的是该zip函数,但还没有设法让它在一个级别(两列)上工作。
回答by DSM
I don't understand why there isn't a B2in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
我不明白为什么B2你的字典中没有 a 。我也不确定在重复列值的情况下你想发生什么(我的意思是除了最后一个之外的每一个。)假设第一个是疏忽,我们可以使用递归:
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
which produces
产生
>>> df
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
>>> pprint.pprint(recur_dictify(df))
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
'C': {'C1': {'C11': 4}}}
It might be simpler to use a non-pandas approach, though:
不过,使用非Pandas方法可能更简单:
def retro_dictify(frame):
d = {}
for row in frame.values:
here = d
for elem in row[:-2]:
if elem not in here:
here[elem] = {}
here = here[elem]
here[row[-2]] = row[-1]
return d
回答by alko
You can reconstruct your dictionary as easy as follows
你可以像下面这样简单地重建你的字典
>>> result = {}
>>> for lst in df.values:
... leaf = result
... for path in lst[:-2]:
... leaf = leaf.setdefault(path, {})
... leaf.setdefault(lst[-2], list()).append(lst[-1])
...
>>> result
{'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B': {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}}
If you're sure your leafs won't overlap, replace last line
如果您确定您的叶子不会重叠,请替换最后一行
... leaf.setdefault(lst[-2], list()).append(lst[-1])
with
和
... leaf[lst[-2]] = lst[-1]
to get output you desired:
得到你想要的输出:
>>> result
{'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}}
Sample data used for tests:
用于测试的样本数据:
import pandas as pd
data = {'name': ['A','A','B','C','B','A'],
'v1': ['A1','A2','B1','C1','B2','A2'],
'v2': ['A11','A12','B12','C11','B21','A21'],
'v3': [1,2,3,4,5,6]}
df = pd.DataFrame.from_dict(data)
回答by Jeff
see hereas their are some options that you can pass to get the output in several different forms.
请参阅此处,因为它们是您可以传递的一些选项,以获得几种不同形式的输出。
In [5]: df
Out[5]:
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
In [6]: df.to_dict()
Out[6]:
{'name': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'B', 5: 'A'},
'v1': {0: 'A1', 1: 'A2', 2: 'B1', 3: 'C1', 4: 'B2', 5: 'A2'},
'v2': {0: 'A11', 1: 'A12', 2: 'B12', 3: 'C11', 4: 'B21', 5: 'A21'},
'v3': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}}
Here is a way to create a json format, then literally eval it to create an actual dict
这是一种创建 json 格式的方法,然后从字面上对其进行评估以创建实际的 dict
In [11]: import ast
In [15]: ast.literal_eval(df.to_json(orient='values'))
Out[15]:
[['A', 'A1', 'A11', 1],
['A', 'A2', 'A12', 2],
['B', 'B1', 'B12', 3],
['C', 'C1', 'C11', 4],
['B', 'B2', 'B21', 5],
['A', 'A2', 'A21', 6]]
回答by Anton vBR
Here is another solution using defaultdict
这是使用 defaultdict 的另一个解决方案
df = pd.DataFrame({'name': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'B', 5: 'A'},
'v1': {0: 'A1', 1: 'A2', 2: 'B1', 3: 'C1', 4: 'B2', 5: 'A2'},
'v2': {0: 'A11', 1: 'A12', 2: 'B12', 3: 'C11', 4: 'B21', 5: 'A21'},
'v3': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}})
output = defaultdict(dict)
for lst in df.values:
try:
output[lst[0]][lst[1]].update({lst[2]:lst[3]})
except KeyError:
output[lst[0]][lst[1]] = {}
finally:
output[lst[0]][lst[1]].update({lst[2]:lst[3]})
output
or:
或者:
output = defaultdict(dict)
for row in df.values:
item1,item2 = row[0:2]
if output.get(item1, {}).get(item2) == None:
output[item1][item2] = {}
output[item1][item2].update({row[2]:row[3]})

