Python 字典中的熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18161926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:01:00  来源:igfitidea点击:

Pandas data frame from dictionary

pythonpandas

提问by Godel

I have a python dictionary of user-item ratings that looks something like this:

我有一个用户项目评分的 python 字典,看起来像这样:

sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0}, 
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0}, 
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}

I was looking to convert it into a pandas data frame that would be structured like

我正在寻找将其转换为结构类似于的熊猫数据框

     col1   col2  col3
0   user1  item1   2.5
1   user1  item2   3.5
2   user1  item3   3.0
3   user1  item4   3.5
4   user1  item5   2.5
5   user1  item6   3.0
6   user2  item1   2.5
7   user2  item2   3.0
8   user2  item3   3.5
9   user2  item4   4.0
10  user3  item2   4.5
11  user3  item5   1.0
12  user3  item6   4.0

Any ideas would be much appreciated :)

任何想法将不胜感激:)

采纳答案by falsetru

Try following code:

尝试以下代码:

import pandas

sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
        'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
        'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}

df = pandas.DataFrame([
    [col1,col2,col3] for col1, d in sample.items() for col2, col3 in d.items()
])

回答by geekchic

You could try doing it like this perhaps.

也许你可以尝试这样做。

temp=[]
for item in sample:
    temp.append(pandas.DataFrame(item))
self.results = pandas.concat(temp)

回答by DSM

I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by pd.melt, and everything else is basically renaming and reordering:

我认为你所追求的操作——旋转表格——被称为“融化”。在这种情况下,困难的部分可以由 完成pd.melt,其他一切基本上都是重命名和重新排序:

df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
df = pd.melt(df, "item", var_name="user").dropna()
df = df[["user", "item", "value"]].reset_index(drop=True)


Simply calling DataFrameproduces something which has the information we want but has the wrong shape:

简单地调用DataFrame会产生一些具有我们想要的信息但形状错误的东西:

>>> df = pd.DataFrame(sample)
>>> df
       user1  user2  user3
item1    2.5    2.5    NaN
item2    3.5    3.0    4.5
item3    3.0    3.5    NaN
item4    3.5    4.0    NaN
item5    2.5    NaN    1.0
item6    3.0    NaN    4.0

So let's promote the index to a real column and improve the name:

因此,让我们将索引提升为真正的列并改进名称:

>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
>>> df
    item  user1  user2  user3
0  item1    2.5    2.5    NaN
1  item2    3.5    3.0    4.5
2  item3    3.0    3.5    NaN
3  item4    3.5    4.0    NaN
4  item5    2.5    NaN    1.0
5  item6    3.0    NaN    4.0

Then we can call pd.meltto turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value").

然后我们可以调用pd.melt来转动列。如果我们没有指定我们想要的变量名“user”,它会给它一个无聊的名字“variable”(就像它给数据本身一个无聊的名字“value”)。

>>> df = pd.melt(df, "item", var_name="user").dropna()
>>> df
     item   user  value
0   item1  user1    2.5
1   item2  user1    3.5
2   item3  user1    3.0
3   item4  user1    3.5
4   item5  user1    2.5
5   item6  user1    3.0
6   item1  user2    2.5
7   item2  user2    3.0
8   item3  user2    3.5
9   item4  user2    4.0
13  item2  user3    4.5
16  item5  user3    1.0
17  item6  user3    4.0

Finally, we can reorder and renumber the indices:

最后,我们可以对索引重新排序和重新编号:

>>> df = df[["user", "item", "value"]].reset_index(drop=True)
>>> df
     user   item  value
0   user1  item1    2.5
1   user1  item2    3.5
2   user1  item3    3.0
3   user1  item4    3.5
4   user1  item5    2.5
5   user1  item6    3.0
6   user2  item1    2.5
7   user2  item2    3.0
8   user2  item3    3.5
9   user2  item4    4.0
10  user3  item2    4.5
11  user3  item5    1.0
12  user3  item6    4.0

meltis pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.

melt一旦你习惯了它就非常有用。通常,就像这里一样,您在前后进行一些重命名/重新排序。

回答by Felix Zumstein

This one is very similar to the meltsolution provided by DSM:

这个和meltDSM 提供的解决方案非常相似:

df = DataFrame(sample)
df = df.unstack().dropna().reset_index()
df = df.rename(columns={'level_0':'col1', 'level_1':'col2', 0:'col3'})

回答by Boud

I provide another possibility here using pd.stack:

我在这里提供了另一种可能性pd.stack

df = pd.DataFrame(sample)
df = df.T.stack().reset_index()


Detailed explanations

详细说明

In [24]: df = pd.DataFrame(sample)

In [25]: df
Out[25]: 
       user1  user2  user3
item1    2.5    2.5    NaN
item2    3.5    3.0    4.5
item3    3.0    3.5    NaN
item4    3.5    4.0    NaN
item5    2.5    NaN    1.0
item6    3.0    NaN    4.0

Applying stackwill pivot the column axis on a sublevel of the row axis already indexed by item. As you want userfirst, let's do the operation on the transposed DataFrame by using .T:

应用stack将在已经由 索引的行轴的子级别上旋转列轴item。只要你想user,首先让我们通过做转置数据框的操作.T

In [34]: df = df.T.stack()

In [35]: df
Out[35]: 
user1  item1    2.5
       item2    3.5
       item3    3.0
       item4    3.5
       item5    2.5
       item6    3.0
user2  item1    2.5
       item2    3.0
       item3    3.5
       item4    4.0
user3  item2    4.5
       item5    1.0
       item6    4.0
dtype: float64

You expect basic columns and not index, so just reset the index:

您期望基本列而不是索引,因此只需重置索引:

In [36]: df = df.reset_index()

In [37]: df
Out[37]: 
   level_0 level_1    0
0    user1   item1  2.5
1    user1   item2  3.5
2    user1   item3  3.0
3    user1   item4  3.5
4    user1   item5  2.5
5    user1   item6  3.0
6    user2   item1  2.5
7    user2   item2  3.0
8    user2   item3  3.5
9    user2   item4  4.0
10   user3   item2  4.5
11   user3   item5  1.0
12   user3   item6  4.0