Python Pandas 应用密钥错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39960728/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:58:52  来源:igfitidea点击:

Pandas Apply Key Error

pythonpandasgroup-bykeyerrorkaggle

提问by user133248

I'm fairly new to Python and data science. I'm working on the kaggle Outbrain competition, and all datasets referenced in my code can be found at https://www.kaggle.com/c/outbrain-click-prediction/data.

我对 Python 和数据科学还很陌生。我正在参加 kaggle Outbrain 比赛,我的代码中引用的所有数据集都可以在https://www.kaggle.com/c/outbrain-click-prediction/data找到。

On to the problem: I have a dataframe with columns ['document_id', 'category_id', 'confidence_level']. I would like to add a fourth column, 'max_cat', that returns the 'category_id'value that corresponds to the greatest 'confidence_level'value for the row's 'document_id'.

关于问题:我有一个带有 columns 的数据框['document_id', 'category_id', 'confidence_level']。我想添加第四列 ,'max_cat'它返回'category_id'对应于'confidence_level'行的最大值的值'document_id'

import pandas as pd
import numpy

main_folder = r'...filepath\data_location' + '\'

docs_meta = pd.read_csv(main_folder + 'documents_meta.csv\documents_meta.csv',nrows=1000)
docs_categories = pd.read_csv(main_folder + 'documents_categories.csv\documents_categories.csv',nrows=1000)
docs_entities = pd.read_csv(main_folder + 'documents_entities.csv\documents_entities.csv',nrows=1000)
docs_topics = pd.read_csv(main_folder + 'documents_topics.csv\documents_topics.csv',nrows=1000)

def find_max(row,the_df,groupby_col,value_col,target_col):
   return the_df[the_df[groupby_col]==row[groupby_col]].loc[the_df[value_col].idxmax()][target_col]

test = docs_categories.copy()
test['max_cat'] = test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id'))

This gives me the error: KeyError: ('document_id', 'occurred at index document_id')

这给了我错误: KeyError: ('document_id', 'occurred at index document_id')

Can anyone help explain either why this error occurred, or how to achieve my goal in a more efficient manner?

任何人都可以帮助解释为什么会发生此错误,或者如何以更有效的方式实现我的目标?

Thanks!

谢谢!

回答by OriolAbril

As answered by EdChum in the comments. The issue is that applyworks column wise by default (see the docs). Therefore, the column names cannot be accessed.

正如 EdChum 在评论中所回答的那样。问题是apply默认情况下按列工作(请参阅文档)。因此,无法访问列名。

To specify that it should be applied to each row instead, axis=1must be passed:

要指定它应该应用于每一行,axis=1必须传递:

test.apply(lambda x: find_max(x,test,'document_id','confidence_level','category_id'), axis=1)