Python Pandas unstack 问题：ValueError：索引包含重复条目，无法重塑

Question

提问by ARF

I am trying to unstack a multi-index with pandas and I am keep getting:

我正在尝试使用 Pandas 解开多索引，但我不断收到：

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

给定一个包含四列的数据集：

id (string)
date (string)
location (string)
value (float)

id（字符串）
日期（字符串）
位置（字符串）
价值（浮动）

I first set a three-level multi-index:

我先设置了一个三级多索引：

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

然后我尝试拆开位置：

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

这里发生了什么？

Answer 1

采纳答案by Andy Hayden

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

这是一个示例数据帧，它显示了这一点，它具有具有相同索引的重复值。问题是，您是要汇总这些还是将它们保留为多行？

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index(and get back to df) and use pivot_table.

一种解决方案是reset_index（并回到df）并使用pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

另一种选择（如果您不想聚合）是附加一个虚拟级别，将其拆开，然后删除虚拟级别...

Answer 2

回答by HVS

There's a far more simpler solution to tackle this.

有一个更简单的解决方案来解决这个问题。

The reason why you get ValueError: Index contains duplicate entries, cannot reshapeis because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

你得到的原因ValueError: Index contains duplicate entries, cannot reshape是，一旦你 unstack " Location"，那么剩下的索引列 " id" 和 " date" 组合不再是唯一的。

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

您可以通过保留默认索引列（行#）来避免这种情况，并在使用“ id”、“ date”和“ location”设置索引时，将其添加到“ append”模式而不是默认覆盖模式。

So use,

所以用，

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstackwill work.

完成此操作后，您的索引列仍将具有默认索引和设置索引。并且unstack会起作用。

Let me know how it works out.

让我知道它是如何工作的。

Answer 3

回答by Grag2015

I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

我有这样的问题。在我的情况下，问题出在数据中 - 我的“信息”列包含 1 个唯一值并导致错误

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

更新：纠正工作“枢轴”对（id_user，信息）不能有重复

It works:

它的工作原理：

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phon','phon','phone','phone1','phone','phone1','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

它不起作用：

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phone','phone','phone','phone','phone','phone','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984

来源：https: //stackoverflow.com/a/37021196/6088984

Python Pandas unstack 问题：ValueError：索引包含重复条目，无法重塑

提问by ARF

采纳答案by Andy Hayden

回答by HVS

回答by Grag2015

相关推荐

最近更新

标签

Python Pandas unstack 问题：ValueError：索引包含重复条目，无法重塑

提问by ARF

采纳答案by Andy Hayden

回答by HVS

回答by Grag2015

相关推荐

Python Tkinter tkFileDialog 不存在

Python 如何在字符串中添加 X 个空格

Python 导入错误：没有名为visual 的模块

Python 在 Pandas Dataframe 中为字符串添加前导零

相关推荐

最近更新

标签