pandas ValueError：索引包含重复条目，无法重塑

Question

提问by Blue Moon

I'm trying to reshape my pd dataframe with the following function:

我正在尝试使用以下函数重塑我的 pd 数据框：

 ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')

The dataset looks like:

数据集如下所示：

     Merch Ref            Received  acceptance_rate
0           SF 2014-08-28 15:38:00                0
1           SF 2014-08-28 15:44:00                0
2           SF 2014-08-28 16:04:00                0
3           WF 2014-08-28 16:05:00                0
4           WF 2014-08-28 16:07:00                0
5           SF 2014-08-28 16:34:00                0
6           SF 2014-08-28 16:55:00                0
7           BF 2014-08-28 17:59:00                0
8           BF 2014-08-29 15:05:00                0
9           SF 2014-08-29 21:25:00                0
10          SF 2014-08-30 10:29:00                0
...

What I'd like to obtain is:

我想获得的是：

                      SF WF BF 
2014-08-28 15:38:00    0  1  0
2014-08-28 15:44:00    0  1  0
2014-08-28 16:04:00    0  0  1
2014-08-28 16:05:00    1  1  0
2014-08-28 16:07:00    0  0  1
2014-08-28 16:34:00    1  1  0
2014-08-28 16:55:00    1  1  0
2014-08-28 17:59:00    0  1  0
2014-08-29 15:05:00    0  0  1
2014-08-29 21:25:00    0  0  1 
2014-08-30 10:29:00    0  1  0

However, I get the error:

但是，我收到错误消息：

 ValueError: Index contains duplicate entries, cannot reshape

This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

这是因为我同时下了一些订单。有没有办法总结/汇总这些订单？

Answer 1

回答by timctran

As you identified, the error occurs from duplicates in pairs (x, y) for x in Receivedand y in Merch Ref.

正如您所确定的，错误发生在 x inReceived和 y in成对 (x, y) 的重复项中Merch Ref。

If you would like to aggregate by sumthen

如果你想在sum那时聚合

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate', aggfunc=np.sum)

. The default aggregation function is mean. That is,

. 默认聚合函数是mean。那是，

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate')

, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.meanfunction.

, 将旋转表，所有具有相同 (x, y) 对的条目将与该np.mean函数聚合。

Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnullor pd.notnull.

备注：我最初收到相同的错误，但在遍历 (x, y) 对后，我没有找到任何重复项。事实证明，一些对的形式为 ( nan, nan) 并且在迭代过程中被省略了。因此，对于尝试调试他们认为是唯一对的其他用户，请考虑nan使用pd.isnull或来检查s pd.notnull。

Answer 2

回答by Farid

Try to remove duplicate:

尝试删除重复项：

ar = ar.drop_duplicates(['Received','Merch Ref'])

it should work

它应该工作

pandas ValueError：索引包含重复条目，无法重塑

提问by Blue Moon

回答by timctran

回答by Farid

相关推荐

最近更新

标签

pandas ValueError：索引包含重复条目，无法重塑

提问by Blue Moon

回答by timctran

回答by Farid

相关推荐

如何在 Pandas DataFrame 中使用 inside / in 运算符？

pandas 在python中基于条件绘制多色线

pandas 将 dataframe.hist() 保存到文件

在 Cloud9 中安装 Python 模块 pandas

相关推荐

最近更新

标签