pandas 从字典中创建熊猫数据框

Question

提问by Feynman27

I have a dictionary of dictionaries of the form:

我有一个以下形式的字典字典：

{'user':{movie:rating} }

For example,

例如，

{Jill': {'Avenger: Age of Ultron': 7.0,
                            'Django Unchained': 6.5,
                            'Gone Girl': 9.0,
                            'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
                                'Django Unchained': 9.0,
                                'Zoolander': 2.0}}

I want to convert this dict of dicts into a pandas dataframe with column 1 the user name and the other columns the movie ratings i.e.

我想将这个 dicts dict 转换为一个 Pandas 数据框，第 1 列是用户名，其他列是电影评级，即

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \

However, some users did not rate the movies and so these movies are not included in the values() for that user key(). It would be nice in these cases to just fill the entry with NaN.

但是，一些用户没有对电影进行评分，因此这些电影不包含在该用户 key() 的 values() 中。在这些情况下，用 NaN 填充条目会很好。

As of now, I iterate over the keys, fill a list, and then use this list to create a dataframe:

到目前为止，我遍历键，填充一个列表，然后使用这个列表来创建一个数据框：

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl']
                    ,movie_user_preferences[key]['Horrible Bosses 2']
                    ,movie_user_preferences[key]['Django Unchained']
                    ,movie_user_preferences[key]['Zoolander']
                    ,movie_user_preferences[key]['Avenger: Age of Ultron']
                    ,movie_user_preferences[key]['Kill the Messenger']))
    # if no entry, skip
    except:
        pass 
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])

But this only gives me a dataframe of users who rated all the movies in the set.

但这只会给我一个用户的数据框，这些用户对集合中的所有电影进行了评分。

My goal is to append to the data list by iterating over the movie labels (rather than the brute force approach shown above) and, secondly, create a dataframe that includes all users and that places null values in the elements that do not have movie ratings.

我的目标是通过迭代电影标签（而不是上面显示的蛮力方法）来附加到数据列表，其次，创建一个包含所有用户的数据框，并将空值放在没有电影评级的元素中.

Answer 1

回答by Andy Hayden

You can pass the dict of dict to the DataFrame constructor:

您可以将 dict 的 dict 传递给 DataFrame 构造函数：

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}

In [12]: pd.DataFrame(d)
Out[12]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

Or use the from_dictmethod:

或者使用以下from_dict方法：

In [13]: pd.DataFrame.from_dict(d)
Out[13]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
      Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander
Jill               6.5          9                   8                     7.0        NaN
Toby               9.0        NaN                 NaN                     8.5          2

Answer 2

回答by Feynman27

This brute-force approach also appears to work, but iterating over the movie labels would still be more robust in my opinion.

这种蛮力方法似乎也有效，但在我看来，迭代电影标签仍然会更健壮。

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))

    # if no entry, skip
    except:
        pass


 user Gone_Girl Horrible_Bosses_2  Django_Unchained Zoolander  \
 0      Sam         6                 3               7.5         7   
 1      Max        10                 6               7.0        10   
 2   Robert       NaN                 5               7.0         9   
 3     Toby       NaN               NaN               9.0         2   
 4    Julia       6.5               NaN               6.0       6.5   
 5  William         7                 4               8.0         4   
 6     Jill         9               NaN               6.5       NaN   

 Avenger_Age_of_Ultron Kill_the_Messenger  
 0                   10.0                5.5  
 1                    7.0                  5  
 2                    8.0                  9  
 3                    8.5                NaN  
 4                   10.0                  6  
 5                    6.0                6.5  
 6                    7.0                  8

pandas 从字典中创建熊猫数据框

提问by Feynman27

回答by Andy Hayden

回答by Feynman27

相关推荐

最近更新

标签

pandas 从字典中创建熊猫数据框

提问by Feynman27

回答by Andy Hayden

回答by Feynman27

相关推荐

插入 Excel VBA 字符串

vba 如何将长字符串分成多行

Python pandas - read_csv 是否保持文件打开？

Excel VBA 使用 FileSystemObject 列出最后修改日期的文件

相关推荐

最近更新

标签