Python Pandas 合并 keyerror
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34227038/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas merge keyerror
提问by Sara
Consistently getting a keyerror when I try to merge two data frames. The code:
当我尝试合并两个数据帧时,始终收到一个关键错误。编码:
c = pd.merge(a, b, on='video_id', how='left')
Based on internet research I double checked the dtype and coerced both to int:
根据互联网研究,我仔细检查了 dtype 并将两者都强制为 int:
a = pd.read_csv(filename, index_col=False, dtype={'video_id': np.int64}, low_memory=False)
b = pd.read_csv(videoinfo, index_col=False, dtype={'video_id': np.int64})
Renamed the columns (to make sure they match):
重命名列(以确保它们匹配):
a.columns.values[2] = "video_id"
b.columns.values[0] = "video_id"
Coerced to df:
强制为 df:
c = pd.merge(pd.DataFrame(a), pd.DataFrame(b), on='video_id', how='left')
Out of ideas as to why I'm still getting the keyerror. And it's always "KeyError: 'video_id'"
关于为什么我仍然收到密钥错误的想法。它总是“KeyError:'video_id'”
回答by Amy D
You want to be careful not to use df.columns.values
to rename columns. Doing so screws with the indexing on your column names.
您要小心不要使用df.columns.values
重命名列。这样做会破坏列名称的索引。
If you know which column names you're replacing, you can try something like this:
如果您知道要替换哪些列名称,则可以尝试以下操作:
a.rename(columns={'old_col_name':'video_id'}, inplace = True)
b.rename(columns={'old_col_name':'video_id'}, inplace = True)
If you don't know the column names ahead of time, you can try:
如果您事先不知道列名,您可以尝试:
col_names_a = a.columns
col_names_a[index] = 'video_id'
a.columns = col_names_a
Keep in mind, you actually don't need to use the same column names on both dataframes. Pandas allows you to specify the individual names in each dataframe
请记住,您实际上不需要在两个数据帧上使用相同的列名。Pandas 允许您在每个数据框中指定单独的名称
pd.merge(a, b, left_on = 'a_col', right_on = 'b_col', how = 'left')
回答by Sara
There was a leading space in one of the dfs in the column name, 'video_id ' instead of 'video_id'. Not sure why the initial rename didn't fix that but it's fixed.
列名中的 dfs 之一中有一个前导空格,“video_id”而不是“video_id”。不知道为什么最初的重命名没有解决这个问题,但它已经解决了。
回答by Sastivel Loganathan
Sending the left_on and Right_on parameters as arrays worked for me.
将 left_on 和 Right_on 参数作为数组发送对我有用。
c = pd.merge(pd.DataFrame(a), pd.DataFrame(b), left_on=['video_id'],
right_on= ['video_id'], how='left')