在 Pandas 中合并两个数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43909479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:35:21  来源:igfitidea点击:

Merge two datasets in Pandas

pythonpandasmerge

提问by freddy888

I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:

我以前曾与 Stata 合作过,现在正在尝试用 Python 完成同样的工作。但是,我在使用合并命令时遇到了麻烦。不知何故,我一定错过了一些东西。我要合并的两个数据框如下所示:

 df1:
 Date id Market_Cap
 2000 1  400
 2000 2  200
 2001 1  410
 2001 2  220

 df2:
 id Ticker
 1   Shell
 2   ExxonMobil

My aim now is to get the following dataset:

我现在的目标是获得以下数据集:

Date id Market_Cap  Ticker
2000 1  400        Shell 
2000 2  200        ExxonMobil 
2001 1  410        Shell 
2001 2  220        ExxonMobil

I tried the following command:

我尝试了以下命令:

merged= pd.merge(df1, df2, how="left", on="id")

This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.

这合并了数据集,但在 Ticker 列中只给了我 nan。我查看了几个来源,也许我弄错了,但是“左”命令不是为了我的目的而做的正确的事情吗?我也试过“右”和“外”。他们没有得到我想要的结果,“内在”似乎在这里不起作用。

Am I missing something crucial?

我错过了一些重要的东西吗?

回答by jezrael

Thyere is problem your column idin one df is object(obviously string) and another int, so no match and get NaN.

Thyere 的问题是你id在一个 df 中的列是object(显然string)和另一个int,所以没有 match 和 get NaN

If have same dtypes:

如果有相同的dtypes

print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64

merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Another solution if need add only one new column is map:

如果只需要添加一个新列,另一种解决方案是map

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Simulate your problem:

模拟你的问题:

print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date id  Market_Cap Ticker
0  2000  1         400    NaN
1  2000  2         200    NaN
2  2001  1         410    NaN
3  2001  2         220    NaN

And solution is convert to intby astype(or column idin df2to str):

并将解决方案转换为intby astype(或列idin df2to str):

df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil