Python 字典中的熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18161926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas data frame from dictionary
提问by Godel
I have a python dictionary of user-item ratings that looks something like this:
我有一个用户项目评分的 python 字典,看起来像这样:
sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}
I was looking to convert it into a pandas data frame that would be structured like
我正在寻找将其转换为结构类似于的熊猫数据框
col1 col2 col3
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
Any ideas would be much appreciated :)
任何想法将不胜感激:)
采纳答案by falsetru
Try following code:
尝试以下代码:
import pandas
sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}
df = pandas.DataFrame([
[col1,col2,col3] for col1, d in sample.items() for col2, col3 in d.items()
])
回答by geekchic
You could try doing it like this perhaps.
也许你可以尝试这样做。
temp=[]
for item in sample:
temp.append(pandas.DataFrame(item))
self.results = pandas.concat(temp)
回答by DSM
I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by pd.melt
, and everything else is basically renaming and reordering:
我认为你所追求的操作——旋转表格——被称为“融化”。在这种情况下,困难的部分可以由 完成pd.melt
,其他一切基本上都是重命名和重新排序:
df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
df = pd.melt(df, "item", var_name="user").dropna()
df = df[["user", "item", "value"]].reset_index(drop=True)
Simply calling DataFrame
produces something which has the information we want but has the wrong shape:
简单地调用DataFrame
会产生一些具有我们想要的信息但形状错误的东西:
>>> df = pd.DataFrame(sample)
>>> df
user1 user2 user3
item1 2.5 2.5 NaN
item2 3.5 3.0 4.5
item3 3.0 3.5 NaN
item4 3.5 4.0 NaN
item5 2.5 NaN 1.0
item6 3.0 NaN 4.0
So let's promote the index to a real column and improve the name:
因此,让我们将索引提升为真正的列并改进名称:
>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
>>> df
item user1 user2 user3
0 item1 2.5 2.5 NaN
1 item2 3.5 3.0 4.5
2 item3 3.0 3.5 NaN
3 item4 3.5 4.0 NaN
4 item5 2.5 NaN 1.0
5 item6 3.0 NaN 4.0
Then we can call pd.melt
to turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value").
然后我们可以调用pd.melt
来转动列。如果我们没有指定我们想要的变量名“user”,它会给它一个无聊的名字“variable”(就像它给数据本身一个无聊的名字“value”)。
>>> df = pd.melt(df, "item", var_name="user").dropna()
>>> df
item user value
0 item1 user1 2.5
1 item2 user1 3.5
2 item3 user1 3.0
3 item4 user1 3.5
4 item5 user1 2.5
5 item6 user1 3.0
6 item1 user2 2.5
7 item2 user2 3.0
8 item3 user2 3.5
9 item4 user2 4.0
13 item2 user3 4.5
16 item5 user3 1.0
17 item6 user3 4.0
Finally, we can reorder and renumber the indices:
最后,我们可以对索引重新排序和重新编号:
>>> df = df[["user", "item", "value"]].reset_index(drop=True)
>>> df
user item value
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
melt
is pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.
melt
一旦你习惯了它就非常有用。通常,就像这里一样,您在前后进行一些重命名/重新排序。
回答by Felix Zumstein
This one is very similar to the melt
solution provided by DSM:
这个和melt
DSM 提供的解决方案非常相似:
df = DataFrame(sample)
df = df.unstack().dropna().reset_index()
df = df.rename(columns={'level_0':'col1', 'level_1':'col2', 0:'col3'})
回答by Boud
I provide another possibility here using pd.stack
:
我在这里提供了另一种可能性pd.stack
:
df = pd.DataFrame(sample)
df = df.T.stack().reset_index()
Detailed explanations
详细说明
In [24]: df = pd.DataFrame(sample)
In [25]: df
Out[25]:
user1 user2 user3
item1 2.5 2.5 NaN
item2 3.5 3.0 4.5
item3 3.0 3.5 NaN
item4 3.5 4.0 NaN
item5 2.5 NaN 1.0
item6 3.0 NaN 4.0
Applying stack
will pivot the column axis on a sublevel of the row axis already indexed by item
. As you want user
first, let's do the operation on the transposed DataFrame by using .T
:
应用stack
将在已经由 索引的行轴的子级别上旋转列轴item
。只要你想user
,首先让我们通过做转置数据框的操作.T
:
In [34]: df = df.T.stack()
In [35]: df
Out[35]:
user1 item1 2.5
item2 3.5
item3 3.0
item4 3.5
item5 2.5
item6 3.0
user2 item1 2.5
item2 3.0
item3 3.5
item4 4.0
user3 item2 4.5
item5 1.0
item6 4.0
dtype: float64
You expect basic columns and not index, so just reset the index:
您期望基本列而不是索引,因此只需重置索引:
In [36]: df = df.reset_index()
In [37]: df
Out[37]:
level_0 level_1 0
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0