pandas 使用熊猫将多个数据帧合并为一个

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28299875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:54:27  来源:igfitidea点击:

merge multiple dataframes into one using pandas

pythonpandasmergedataframe

提问by user6396

I have one data frame df:

我有一个数据框 df:

   fruit      date    volume
0  apple    20141001    2000
1  apple    20141101    1800
2  apple    20141201    2200
3  orange   20141001    1900
4  orange   20141101    2000
5  orange   20141201    3000
….

and I have following two data frames

我有以下两个数据框

apple:

苹果:

   date       price
0  20141001   2
1  20141101   2.5
2  20141201   3

orange:

橘子:

   date       price
0  20141001   1.5
1  20141101   2
2  20141201   2

how can I merge all these in to the following data frame:

如何将所有这些合并到以下数据框中:

   fruit      date    price    volume
0  apple    20141001   2       2000
1  apple    20141101   2.5     1800
2  apple    20141201   3       2200
3  orange   20141001   1.5     1900
4  orange   20141101   2       2000
5  orange   20141201   2       3000
….

This is just a example, in my real work, I have hundreds of 'fruit' with price data need to be merged into the first data frame.

这只是一个例子,在我的实际工作中,我有数百个“水果”需要将价格数据合并到第一个数据框中。

should I use merge or join? what is the difference between them? Thank you.

我应该使用合并还是加入?它们之间有什么区别?谢谢你。

回答by EdChum

For your sample data you can achieve what you want by performing concattwice, this assumes that the last 2 dfs align with the master df. The inner concatconcatenates the 2 supplemnentary dfs into a single df row-wise, the outer concatconcatenates column-wise:

对于您的示例数据,您可以通过执行concat两次来实现您想要的结果,这假设最后 2 个 dfs 与主 df 对齐。内部concat将 2 个补充 dfs 按行连接成单个 df,外部按concat列连接:

In [56]:
# this concats the 2 supplementary dfs row-wise into a single df
pd.concat([df1,df2], ignore_index=True)
Out[56]:
       date  price
0  20141001    2.0
1  20141101    2.5
2  20141201    3.0
3  20141001    1.5
4  20141101    2.0
5  20141201    2.0
In [54]:
# now concat column-wise with the main df
pd.concat([df,pd.concat([df1,df2], ignore_index=True)], axis=1)
Out[54]:
    fruit      date  volume      date  price
0   apple  20141001    2000  20141001    2.0
1   apple  20141101    1800  20141101    2.5
2   apple  20141201    2200  20141201    3.0
3  orange  20141001    1900  20141001    1.5
4  orange  20141101    2000  20141101    2.0
5  orange  20141201    3000  20141201    2.0

However, for your real data what you will need to do is to add the price column for each fruit:

但是,对于您的真实数据,您需要做的是为每个水果添加价格列:

In [55]:

df[df['fruit'] == 'apple'].merge(df1, on='date')
Out[55]:
   fruit      date  volume  price
0  apple  20141001    2000    2.0
1  apple  20141101    1800    2.5
2  apple  20141201    2200    3.0

and repeat again for each fruit

并对每个水果重复一遍

An approach to your real data problem would be to add a 'fruit' column to each supplemental df, concatenate all these and then merge back using 'fruit' and 'date' columns as the keys:

解决实际数据问题的一种方法是向每个补充 df 添加一个“水果”列,连接所有这些,然后使用“水果”和“日期”列作为键合并回来:

In [57]:

df1['fruit'] = 'apple'
df2['fruit'] = 'orange'
fruit_df = pd.concat([df1,df2], ignore_index=True)
fruit_df
Out[57]:
       date  price   fruit
0  20141001    2.0   apple
1  20141101    2.5   apple
2  20141201    3.0   apple
3  20141001    1.5  orange
4  20141101    2.0  orange
5  20141201    2.0  orange
In [58]:

df.merge(fruit_df, on=['fruit', 'date'])
Out[58]:
    fruit      date  volume  price
0   apple  20141001    2000    2.0
1   apple  20141101    1800    2.5
2   apple  20141201    2200    3.0
3  orange  20141001    1900    1.5
4  orange  20141101    2000    2.0
5  orange  20141201    3000    2.0