pandas 使用熊猫将多个数据帧合并为一个

Question

提问by user6396

I have one data frame df:

我有一个数据框 df：

   fruit      date    volume
0  apple    20141001    2000
1  apple    20141101    1800
2  apple    20141201    2200
3  orange   20141001    1900
4  orange   20141101    2000
5  orange   20141201    3000
….

and I have following two data frames

我有以下两个数据框

apple:

苹果：

   date       price
0  20141001   2
1  20141101   2.5
2  20141201   3

orange:

橘子：

   date       price
0  20141001   1.5
1  20141101   2
2  20141201   2

how can I merge all these in to the following data frame:

如何将所有这些合并到以下数据框中：

   fruit      date    price    volume
0  apple    20141001   2       2000
1  apple    20141101   2.5     1800
2  apple    20141201   3       2200
3  orange   20141001   1.5     1900
4  orange   20141101   2       2000
5  orange   20141201   2       3000
….

This is just a example, in my real work, I have hundreds of 'fruit' with price data need to be merged into the first data frame.

这只是一个例子，在我的实际工作中，我有数百个“水果”需要将价格数据合并到第一个数据框中。

should I use merge or join? what is the difference between them? Thank you.

我应该使用合并还是加入？它们之间有什么区别？谢谢你。

Answer 1

回答by EdChum

For your sample data you can achieve what you want by performing concattwice, this assumes that the last 2 dfs align with the master df. The inner concatconcatenates the 2 supplemnentary dfs into a single df row-wise, the outer concatconcatenates column-wise:

对于您的示例数据，您可以通过执行concat两次来实现您想要的结果，这假设最后 2 个 dfs 与主 df 对齐。内部concat将 2 个补充 dfs 按行连接成单个 df，外部按concat列连接：

In [56]:
# this concats the 2 supplementary dfs row-wise into a single df
pd.concat([df1,df2], ignore_index=True)
Out[56]:
       date  price
0  20141001    2.0
1  20141101    2.5
2  20141201    3.0
3  20141001    1.5
4  20141101    2.0
5  20141201    2.0
In [54]:
# now concat column-wise with the main df
pd.concat([df,pd.concat([df1,df2], ignore_index=True)], axis=1)
Out[54]:
    fruit      date  volume      date  price
0   apple  20141001    2000  20141001    2.0
1   apple  20141101    1800  20141101    2.5
2   apple  20141201    2200  20141201    3.0
3  orange  20141001    1900  20141001    1.5
4  orange  20141101    2000  20141101    2.0
5  orange  20141201    3000  20141201    2.0

However, for your real data what you will need to do is to add the price column for each fruit:

但是，对于您的真实数据，您需要做的是为每个水果添加价格列：

In [55]:

df[df['fruit'] == 'apple'].merge(df1, on='date')
Out[55]:
   fruit      date  volume  price
0  apple  20141001    2000    2.0
1  apple  20141101    1800    2.5
2  apple  20141201    2200    3.0

and repeat again for each fruit

并对每个水果重复一遍

An approach to your real data problem would be to add a 'fruit' column to each supplemental df, concatenate all these and then merge back using 'fruit' and 'date' columns as the keys:

解决实际数据问题的一种方法是向每个补充 df 添加一个“水果”列，连接所有这些，然后使用“水果”和“日期”列作为键合并回来：

In [57]:

df1['fruit'] = 'apple'
df2['fruit'] = 'orange'
fruit_df = pd.concat([df1,df2], ignore_index=True)
fruit_df
Out[57]:
       date  price   fruit
0  20141001    2.0   apple
1  20141101    2.5   apple
2  20141201    3.0   apple
3  20141001    1.5  orange
4  20141101    2.0  orange
5  20141201    2.0  orange
In [58]:

df.merge(fruit_df, on=['fruit', 'date'])
Out[58]:
    fruit      date  volume  price
0   apple  20141001    2000    2.0
1   apple  20141101    1800    2.5
2   apple  20141201    2200    3.0
3  orange  20141001    1900    1.5
4  orange  20141101    2000    2.0
5  orange  20141201    3000    2.0

pandas 使用熊猫将多个数据帧合并为一个

提问by user6396

回答by EdChum

相关推荐

最近更新

标签

pandas 使用熊猫将多个数据帧合并为一个

提问by user6396

回答by EdChum

相关推荐

使用 Matplotlib.dates.datestr2num 将 Pandas DatetimeIndex 转换为“浮动天数格式”

pandas 如何在 DataFrame 中找到重复的索引？

使用 pandas/matplotlib 或 seaborn 对条形图进行排序

通过字符串变量访问 Pandas DataFrame 的列

相关推荐

最近更新

标签