Python Pandas unstack 问题:ValueError:索引包含重复条目,无法重塑

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28651079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:33:09  来源:igfitidea点击:

Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

pythonpandas

提问by ARF

I am trying to unstack a multi-index with pandas and I am keep getting:

我正在尝试使用 Pandas 解开多索引,但我不断收到:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

给定一个包含四列的数据集:

  • id (string)
  • date (string)
  • location (string)
  • value (float)
  • id(字符串)
  • 日期(字符串)
  • 位置(字符串)
  • 价值(浮动)

I first set a three-level multi-index:

我先设置了一个三级多索引:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

然后我尝试拆开位置:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

这里发生了什么?

采纳答案by Andy Hayden

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

这是一个示例数据帧,它显示了这一点,它具有具有相同索引的重复值。问题是,您是要汇总这些还是将它们保留为多行?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape


One solution is to reset_index(and get back to df) and use pivot_table.

一种解决方案是reset_index(并回到df)并使用pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

另一种选择(如果您不想聚合)是附加一个虚拟级别,将其拆开,然后删除虚拟级别...

回答by HVS

There's a far more simpler solution to tackle this.

有一个更简单的解决方案来解决这个问题。

The reason why you get ValueError: Index contains duplicate entries, cannot reshapeis because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

你得到的原因ValueError: Index contains duplicate entries, cannot reshape是,一旦你 unstack " Location",那么剩下的索引列 " id" 和 " date" 组合不再是唯一的。

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

您可以通过保留默认索引列(行#)来避免这种情况,并在使用“ id”、“ date”和“ location”设置索引时,将其添加到“ append”模式而不是默认覆盖模式。

So use,

所以用,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstackwill work.

完成此操作后,您的索引列仍将具有默认索引和设置索引。并且unstack会起作用。

Let me know how it works out.

让我知道它是如何工作的。

回答by Grag2015

I had such problem. In my case problem was in data - my column 'information' contained 1 unique value and it caused error

我有这样的问题。在我的情况下,问题出在数据中 - 我的“信息”列包含 1 个唯一值并导致错误

UPDATE: to correct work 'pivot' pairs (id_user,information) cannot have duplicates

更新:纠正工作“枢轴”对(id_user,信息)不能有重复

It works:

它的工作原理

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phon','phon','phone','phone1','phone','phone1','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

it doesn't work:

它不起作用

df2 = pd.DataFrame({'id_user':[1,2,3,4,4,5,5], 
'information':['phone','phone','phone','phone','phone','phone','phone'], 
'value': [1, '01.01.00', '01.02.00', 2, '01.03.00', 3, '01.04.00']})
df2.pivot(index='id_user', columns='information', values='value')

source: https://stackoverflow.com/a/37021196/6088984

来源:https: //stackoverflow.com/a/37021196/6088984