Python 在 Pandas 中对列和索引使用合并
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31528819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Merge on a column and Index in Pandas
提问by user2242044
I have two separate dataframes that share a project number. In type_df
, the project number is the index. In time_df
, the project number is a column. I would like to count the number of rows in type_df
that have a Project Type
of 2
. I am trying to do this with pandas.merge()
. It works great when using both columns, but not indices. I'm not sure how to reference the index and if merge
is even the right way to do this.
我有两个单独的数据框,它们共享一个项目编号。在 中type_df
,项目编号为索引。在 中time_df
,项目编号为一列。我想计算type_df
其中有 a Project Type
of的行数2
。我正在尝试使用pandas.merge()
. 使用两列时效果很好,但不能使用索引。我不确定如何引用索引,merge
甚至不知道是否是正确的方法。
import pandas as pd
type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']],
columns=['Project Type'],
index=['Project2', 'Project1'])
time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12],
['Project2', 41]],
columns=['Project', 'Time'])
merged = pd.merge(time_df,type_df, on=[index,'Project'])
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
Error:
错误:
Name 'Index' is not defined.
名称“索引”未定义。
Desired Output:
期望输出:
2
采纳答案by maxymoo
If you want to use an index in your merge you have to specify left_index=True
or right_index=True
, and then use left_on
or right_on
. For you it should look something like this:
如果要在合并中使用索引,则必须指定left_index=True
或right_index=True
,然后使用left_on
或right_on
。对你来说,它应该是这样的:
merged = pd.merge(type_df, time_df, left_index=True, right_on='Project')
回答by dermen
You must have the same column in each dataframe to merge on.
您必须在每个数据框中具有相同的列才能合并。
In this case, just make a 'Project' column for type_df
, then merge on that:
在这种情况下,只需为 制作一个“项目”列type_df
,然后对其进行合并:
type_df['Project'] = type_df.index.values
merged = pd.merge(time_df,type_df, on='Project', how='inner')
merged
# Project Time Project Type
#0 Project1 13 Type 2
#1 Project1 12 Type 2
#2 Project2 41 Type 1
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
2
回答by jezrael
Another solution is use DataFrame.join
:
另一种解决方案是使用DataFrame.join
:
df3 = type_df.join(time_df, on='Project')
For version pandas 0.23.0+
the on
, left_on
, and right_on
parameters may now refer to either column names or index level names:
对于版本pandas 0.23.0+
的on
,left_on
和right_on
参数现在可以指代列名或索引级别名称:
left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key2': ['K0', 'K1', 'K0', 'K1']},
index=left_index)
right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3'],
'key2': ['K0', 'K0', 'K0', 'K1']},
index=right_index)
print (left)
A B key2
key1
K0 A0 B0 K0
K0 A1 B1 K1
K1 A2 B2 K0
K2 A3 B3 K1
print (right)
C D key2
key1
K0 C0 D0 K0
K1 C1 D1 K0
K2 C2 D2 K0
K2 C3 D3 K1
df = left.merge(right, on=['key1', 'key2'])
print (df)
A B key2 C D
key1
K0 A0 B0 K0 C0 D0
K1 A2 B2 K0 C1 D1
K2 A3 B3 K1 C3 D3