Python 在 Pandas 中对列和索引使用合并

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31528819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:10:43  来源:igfitidea点击:

Using Merge on a column and Index in Pandas

pythonpython-2.7pandasmerge

提问by user2242044

I have two separate dataframes that share a project number. In type_df, the project number is the index. In time_df, the project number is a column. I would like to count the number of rows in type_dfthat have a Project Typeof 2. I am trying to do this with pandas.merge(). It works great when using both columns, but not indices. I'm not sure how to reference the index and if mergeis even the right way to do this.

我有两个单独的数据框,它们共享一个项目编号。在 中type_df,项目编号为索引。在 中time_df,项目编号为一列。我想计算type_df其中有 a Project Typeof的行数2。我正在尝试使用pandas.merge(). 使用两列时效果很好,但不能使用索引。我不确定如何引用索引,merge甚至不知道是否是正确的方法。

import pandas as pd
type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']], 
                       columns=['Project Type'], 
                       index=['Project2', 'Project1'])
time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12], 
                               ['Project2', 41]], 
                       columns=['Project', 'Time'])
merged = pd.merge(time_df,type_df, on=[index,'Project'])
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()

Error:

错误:

Name 'Index' is not defined.

名称“索引”未定义。

Desired Output:

期望输出:

2

采纳答案by maxymoo

If you want to use an index in your merge you have to specify left_index=Trueor right_index=True, and then use left_onor right_on. For you it should look something like this:

如果要在合并中使用索引,则必须指定left_index=Trueright_index=True,然后使用left_onright_on。对你来说,它应该是这样的:

merged = pd.merge(type_df, time_df, left_index=True, right_on='Project')

回答by dermen

You must have the same column in each dataframe to merge on.

您必须在每个数据框中具有相同的列才能合并。

In this case, just make a 'Project' column for type_df, then merge on that:

在这种情况下,只需为 制作一个“项目”列type_df,然后对其进行合并:

type_df['Project'] = type_df.index.values
merged = pd.merge(time_df,type_df, on='Project', how='inner')
merged
#    Project  Time Project Type
#0  Project1    13       Type 2
#1  Project1    12       Type 2
#2  Project2    41       Type 1

print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
2

回答by jezrael

Another solution is use DataFrame.join:

另一种解决方案是使用DataFrame.join

df3 = type_df.join(time_df, on='Project')

For version pandas 0.23.0+the on, left_on, and right_onparameters may now refer to either column names or index level names:

对于版本pandas 0.23.0+onleft_onright_on参数现在可以指代列名或索引级别名称

left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                     'key2': ['K0', 'K1', 'K0', 'K1']},
                    index=left_index)

right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                     'D': ['D0', 'D1', 'D2', 'D3'],
                     'key2': ['K0', 'K0', 'K0', 'K1']},
                      index=right_index)

print (left)    
       A   B key2
key1             
K0    A0  B0   K0
K0    A1  B1   K1
K1    A2  B2   K0
K2    A3  B3   K1

print (right)
       C   D key2
key1             
K0    C0  D0   K0
K1    C1  D1   K0
K2    C2  D2   K0
K2    C3  D3   K1


df = left.merge(right, on=['key1', 'key2'])
print (df)
       A   B key2   C   D
key1                     
K0    A0  B0   K0  C0  D0
K1    A2  B2   K0  C1  D1
K2    A3  B3   K1  C3  D3