pandas 熊猫使用日期和另一列合并两列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48031445/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas merge on two columns using date and another column
提问by Hobes
Hello trying to merge two data frames and sum visit counts by date and upc.
您好,尝试合并两个数据框并按日期和 upc 汇总访问计数。
- Transaction data (date,upc,sales) 200k rows x 3 columns
- Visits counts(date, upc, visit count) 2 million+ rows x 3 columns
- 交易数据(日期、UPC、销售额) 200k 行 x 3 列
- 访问计数(日期,upc,访问计数)200 万+ 行 x 3 列
I've tried this
我试过这个
df3 = pandas.merge(df1,df2, on = ['upc','date'], how = 'left')
Result: merge executes but it does not sum up by date or upc
结果:merge 执行但不按日期或 upc 求和
I also tried
我也试过
df3 = pandas.merge(df1,df2, left_on = ['date'], right_on ['upc'] how = 'left')
and that didn't work.
那没有用。
df3 = pandas.merge(df1,df2, left_on = ['date','upc'], right_on ['date','upc'] how = 'left')
and that didn't work.
那没有用。
I also tried
我也试过
df3 = pandas.merge(df1,df2, on = ['date'], how = 'left')
and I kept returning an error message. Based on the error message it looked like I needed to convert one of the dates in the data frames to pandas dtype.
我一直在返回错误消息。根据错误消息,我似乎需要将数据框中的日期之一转换为 pandas dtype。
I made that change and returned the same results as my first try. The merge worked but it did not sum up the results. I tried converting both dates in both data frames to astype(str)
and that didn't work. I learned if both dates have the same date format dtype or astype(str)
I return a memory error message. Merge would fail.
我进行了更改并返回了与第一次尝试相同的结果。合并有效,但没有总结结果。我尝试将两个数据框中的两个日期都转换为astype(str)
,但没有用。我了解到两个日期是否具有相同的日期格式 dtype 或者astype(str)
我返回内存错误消息。合并会失败。
I was successful with merging using the upc only but this creates an issue in my data, I return duplicate visit numbers because a upc is repeated in the transaction due to the date column.
我仅使用 upc 成功合并,但这在我的数据中造成了问题,我返回重复的访问次数,因为由于日期列在事务中重复了 upc。
End of the day what I need is something similar to a sumif function in excel.
归根结底,我需要的是类似于 excel 中的 sumif 函数的东西。
I need to combine the two data sets by summarizing the total visits by each upc for each day and keeping transaction data unchanged or left join in terms of sql
我需要通过总结每个 upc 每天的总访问量并保持交易数据不变或在 sql 方面保持连接来组合两个数据集
Sample data
样本数据
df1
Date upc sales
0 09/01/2016 A01234 1000
1 09/02/2016 A01234 500
2 09/10/2016 A56789 1200
df2
Date upc visits
0 09/01/2016 A01234 10
1 09/02/2016 A01234 25
2 09/05/2016 A56789 26
3 09/10/2016 A56789 32
df3
Date upc sales visits
0 09/01/2016 A01234 1000 10
1 09/02/2016 A01234 500 25
2 09/10/2016 A56789 1200 32
Wondering if pandasql package is what I need to use. Any help is appreciated
想知道我是否需要使用 pandasql 包。任何帮助表示赞赏
回答by PaSTE
The first merge statement you perform should get you halfway there, but unfortunately, it's the second half of a two-step process. It sounds like you want to merge the sales data onto the visits data aftersumming the visits by Date/upc. You actually have to do a sum first (the merge
command does not do this by itself). Try:
您执行的第一个合并语句应该可以让您完成一半,但不幸的是,这是两步过程的后半部分。听起来您想在按 Date/upc汇总访问后将销售数据合并到访问数据中。您实际上必须先计算总和(merge
命令本身不会执行此操作)。尝试:
df2_sum = df2.groupby(["Date", "upc"])["visits"].sum().reset_index()
Then left-merge this onto the sales data:
然后将其左合并到销售数据上:
df3 = pd.merge(df1, df2_sum, on=["Date", "upc"], how="left")