pandas ValueError:索引包含重复条目,无法重塑
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31785371/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: Index contains duplicate entries, cannot reshape
提问by Blue Moon
I'm trying to reshape my pd dataframe with the following function:
我正在尝试使用以下函数重塑我的 pd 数据框:
ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')
The dataset looks like:
数据集如下所示:
Merch Ref Received acceptance_rate
0 SF 2014-08-28 15:38:00 0
1 SF 2014-08-28 15:44:00 0
2 SF 2014-08-28 16:04:00 0
3 WF 2014-08-28 16:05:00 0
4 WF 2014-08-28 16:07:00 0
5 SF 2014-08-28 16:34:00 0
6 SF 2014-08-28 16:55:00 0
7 BF 2014-08-28 17:59:00 0
8 BF 2014-08-29 15:05:00 0
9 SF 2014-08-29 21:25:00 0
10 SF 2014-08-30 10:29:00 0
...
What I'd like to obtain is:
我想获得的是:
SF WF BF
2014-08-28 15:38:00 0 1 0
2014-08-28 15:44:00 0 1 0
2014-08-28 16:04:00 0 0 1
2014-08-28 16:05:00 1 1 0
2014-08-28 16:07:00 0 0 1
2014-08-28 16:34:00 1 1 0
2014-08-28 16:55:00 1 1 0
2014-08-28 17:59:00 0 1 0
2014-08-29 15:05:00 0 0 1
2014-08-29 21:25:00 0 0 1
2014-08-30 10:29:00 0 1 0
However, I get the error:
但是,我收到错误消息:
ValueError: Index contains duplicate entries, cannot reshape
This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?
这是因为我同时下了一些订单。有没有办法总结/汇总这些订单?
回答by timctran
As you identified, the error occurs from duplicates in pairs (x, y) for x in Receivedand y in Merch Ref.
正如您所确定的,错误发生在 x inReceived和 y in成对 (x, y) 的重复项中Merch Ref。
If you would like to aggregate by sumthen
如果你想在sum那时聚合
ar.pivot_table(index='Received', columns='Merch Ref',
values='acceptance_rate', aggfunc=np.sum)
. The default aggregation function is mean. That is,
. 默认聚合函数是mean。那是,
ar.pivot_table(index='Received', columns='Merch Ref',
values='acceptance_rate')
, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.meanfunction.
, 将旋转表,所有具有相同 (x, y) 对的条目将与该np.mean函数聚合。
Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnullor pd.notnull.
备注:我最初收到相同的错误,但在遍历 (x, y) 对后,我没有找到任何重复项。事实证明,一些对的形式为 ( nan, nan) 并且在迭代过程中被省略了。因此,对于尝试调试他们认为是唯一对的其他用户,请考虑nan使用pd.isnull或 来检查s pd.notnull。
回答by Farid
Try to remove duplicate:
尝试删除重复项:
ar = ar.drop_duplicates(['Received','Merch Ref'])
it should work
它应该工作

