Pandas 合并两个没有某些列的 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45450280/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Merge two DataFrames without some columns
提问by snoob dogg
Context
语境
I'm trying to merge two big CSV files together.
我正在尝试将两个大的 CSV 文件合并在一起。
Problem
问题
Let's say I've one Pandas DataFrame like the following...
假设我有一个 Pandas DataFrame,如下所示......
EntityNum foo ...
------------------------
1001.01 100
1002.02 50
1003.03 200
And another one like this...
还有一个这样的……
EntityNum a_col b_col
-----------------------------------
1001.01 alice 7
1002.02 bob 8
1003.03 777 9
I'd like to join them like this:
我想像这样加入他们:
EntityNum foo a_col
----------------------------
1001.01 100 alice
1002.02 50 bob
1003.03 200 777
So Keep in mind, I don't want b_col in the final result. How do I I accomplish this with Pandas?
所以请记住,我不希望 b_col 出现在最终结果中。II 如何用 Pandas 实现这一点?
Using SQL, I should probably have done something like:
使用 SQL,我可能应该做一些类似的事情:
SELECT t1.*, t2.a_col FROM table_1 as t1
LEFT JOIN table_2 as t2
ON t1.EntityNum = t2.EntityNum;
Search
搜索
I know it is possible to use merge. This is what I've tried:
我知道可以使用合并。这是我尝试过的:
import pandas as pd
df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')
But I'm stuck when it comes to avoiding some of the unwanted columns in the final dataframe.
但是在避免最终数据框中的一些不需要的列时,我陷入了困境。
回答by Alexander
You can first access the relevant dataframe columns via their labels (e.g. df_a[['EntityNum', 'foo']]
and then join those.
您可以首先通过标签访问相关的数据框列(例如df_a[['EntityNum', 'foo']]
,然后加入这些列。
df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')
Note that the default behavior for merge
is to do an inner join.
请注意, for 的默认行为merge
是进行内部联接。
回答by DYZ
Note how in SQL, you first do the join and then select the columns that you want. In the same spirit, you can do a full join in Pandas and then select the wanted columns.
请注意在 SQL 中,您首先执行联接,然后选择所需的列。本着同样的精神,您可以在 Pandas 中进行完全连接,然后选择所需的列。
Alternatively, do a full join and del
the columns you do not want.
或者,对不需要del
的列进行完全连接。
Finally, you can first select the columns that you ant and then do the join.
最后,您可以先选择您蚂蚁的列,然后进行连接。