pandas 从字典中创建熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33157522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
create pandas dataframe from dictionary of dictionaries
提问by Feynman27
I have a dictionary of dictionaries of the form:
我有一个以下形式的字典字典:
{'user':{movie:rating} }
For example,
例如,
{Jill': {'Avenger: Age of Ultron': 7.0,
'Django Unchained': 6.5,
'Gone Girl': 9.0,
'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
'Django Unchained': 9.0,
'Zoolander': 2.0}}
I want to convert this dict of dicts into a pandas dataframe with column 1 the user name and the other columns the movie ratings i.e.
我想将这个 dicts dict 转换为一个 Pandas 数据框,第 1 列是用户名,其他列是电影评级,即
user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander etc. \
However, some users did not rate the movies and so these movies are not included in the values() for that user key(). It would be nice in these cases to just fill the entry with NaN.
但是,一些用户没有对电影进行评分,因此这些电影不包含在该用户 key() 的 values() 中。在这些情况下,用 NaN 填充条目会很好。
As of now, I iterate over the keys, fill a list, and then use this list to create a dataframe:
到目前为止,我遍历键,填充一个列表,然后使用这个列表来创建一个数据框:
data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl']
,movie_user_preferences[key]['Horrible Bosses 2']
,movie_user_preferences[key]['Django Unchained']
,movie_user_preferences[key]['Zoolander']
,movie_user_preferences[key]['Avenger: Age of Ultron']
,movie_user_preferences[key]['Kill the Messenger']))
# if no entry, skip
except:
pass
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])
But this only gives me a dataframe of users who rated all the movies in the set.
但这只会给我一个用户的数据框,这些用户对集合中的所有电影进行了评分。
My goal is to append to the data list by iterating over the movie labels (rather than the brute force approach shown above) and, secondly, create a dataframe that includes all users and that places null values in the elements that do not have movie ratings.
我的目标是通过迭代电影标签(而不是上面显示的蛮力方法)来附加到数据列表,其次,创建一个包含所有用户的数据框,并将空值放在没有电影评级的元素中.
回答by Andy Hayden
You can pass the dict of dict to the DataFrame constructor:
您可以将 dict 的 dict 传递给 DataFrame 构造函数:
In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}
In [12]: pd.DataFrame(d)
Out[12]:
Jill Toby
Avenger: Age of Ultron 7.0 8.5
Django Unchained 6.5 9.0
Gone Girl 9.0 NaN
Kill the Messenger 8.0 NaN
Zoolander NaN 2.0
Or use the from_dict
method:
或者使用以下from_dict
方法:
In [13]: pd.DataFrame.from_dict(d)
Out[13]:
Jill Toby
Avenger: Age of Ultron 7.0 8.5
Django Unchained 6.5 9.0
Gone Girl 9.0 NaN
Kill the Messenger 8.0 NaN
Zoolander NaN 2.0
In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
Django Unchained Gone Girl Kill the Messenger Avenger: Age of Ultron Zoolander
Jill 6.5 9 8 7.0 NaN
Toby 9.0 NaN NaN 8.5 2
回答by Feynman27
This brute-force approach also appears to work, but iterating over the movie labels would still be more robust in my opinion.
这种蛮力方法似乎也有效,但在我看来,迭代电影标签仍然会更健壮。
data=[]
for i,key in enumerate(movie_user_preferences.keys() ):
try:
data.append((key
,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))
# if no entry, skip
except:
pass
user Gone_Girl Horrible_Bosses_2 Django_Unchained Zoolander \
0 Sam 6 3 7.5 7
1 Max 10 6 7.0 10
2 Robert NaN 5 7.0 9
3 Toby NaN NaN 9.0 2
4 Julia 6.5 NaN 6.0 6.5
5 William 7 4 8.0 4
6 Jill 9 NaN 6.5 NaN
Avenger_Age_of_Ultron Kill_the_Messenger
0 10.0 5.5
1 7.0 5
2 8.0 9
3 8.5 NaN
4 10.0 6
5 6.0 6.5
6 7.0 8