Python 在 Pandas 中对列和索引使用合并
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31528819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Merge on a column and Index in Pandas
提问by user2242044
I have two separate dataframes that share a project number. In type_df, the project number is the index. In time_df, the project number is a column. I would like to count the number of rows in type_dfthat have a Project Typeof 2. I am trying to do this with pandas.merge(). It works great when using both columns, but not indices. I'm not sure how to reference the index and if mergeis even the right way to do this.
我有两个单独的数据框,它们共享一个项目编号。在 中type_df,项目编号为索引。在 中time_df,项目编号为一列。我想计算type_df其中有 a Project Typeof的行数2。我正在尝试使用pandas.merge(). 使用两列时效果很好,但不能使用索引。我不确定如何引用索引,merge甚至不知道是否是正确的方法。
import pandas as pd
type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']],
columns=['Project Type'],
index=['Project2', 'Project1'])
time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12],
['Project2', 41]],
columns=['Project', 'Time'])
merged = pd.merge(time_df,type_df, on=[index,'Project'])
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
Error:
错误:
Name 'Index' is not defined.
名称“索引”未定义。
Desired Output:
期望输出:
2
采纳答案by maxymoo
If you want to use an index in your merge you have to specify left_index=Trueor right_index=True, and then use left_onor right_on. For you it should look something like this:
如果要在合并中使用索引,则必须指定left_index=True或right_index=True,然后使用left_on或right_on。对你来说,它应该是这样的:
merged = pd.merge(type_df, time_df, left_index=True, right_on='Project')
回答by dermen
You must have the same column in each dataframe to merge on.
您必须在每个数据框中具有相同的列才能合并。
In this case, just make a 'Project' column for type_df, then merge on that:
在这种情况下,只需为 制作一个“项目”列type_df,然后对其进行合并:
type_df['Project'] = type_df.index.values
merged = pd.merge(time_df,type_df, on='Project', how='inner')
merged
# Project Time Project Type
#0 Project1 13 Type 2
#1 Project1 12 Type 2
#2 Project2 41 Type 1
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
2
回答by jezrael
Another solution is use DataFrame.join:
另一种解决方案是使用DataFrame.join:
df3 = type_df.join(time_df, on='Project')
For version pandas 0.23.0+the on, left_on, and right_onparameters may now refer to either column names or index level names:
对于版本pandas 0.23.0+的on,left_on和right_on参数现在可以指代列名或索引级别名称:
left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key2': ['K0', 'K1', 'K0', 'K1']},
index=left_index)
right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3'],
'key2': ['K0', 'K0', 'K0', 'K1']},
index=right_index)
print (left)
A B key2
key1
K0 A0 B0 K0
K0 A1 B1 K1
K1 A2 B2 K0
K2 A3 B3 K1
print (right)
C D key2
key1
K0 C0 D0 K0
K1 C1 D1 K0
K2 C2 D2 K0
K2 C3 D3 K1
df = left.merge(right, on=['key1', 'key2'])
print (df)
A B key2 C D
key1
K0 A0 B0 K0 C0 D0
K1 A2 B2 K0 C1 D1
K2 A3 B3 K1 C3 D3

