Python “ValueError:无法从重复轴重新索引”是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27236275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:32:49  来源:igfitidea点击:

What does `ValueError: cannot reindex from a duplicate axis` mean?

pythonpandas

提问by Akavall

I am getting a ValueError: cannot reindex from a duplicate axiswhen I am trying to set an index to a certain value. I tried to reproduce this with a simple example, but I could not do it.

ValueError: cannot reindex from a duplicate axis当我尝试将索引设置为某个值时,我得到了一个。我试图用一个简单的例子来重现这个,但我做不到。

Here is my session inside of ipdbtrace. I have a DataFrame with string index, and integer columns, float values. However when I try to create sumindex for sum of all columns I am getting ValueError: cannot reindex from a duplicate axiserror. I created a small DataFrame with the same characteristics, but was not able to reproduce the problem, what could I be missing?

这是我在ipdb跟踪中的会话。我有一个带有字符串索引、整数列和浮点值的 DataFrame。但是,当我尝试为sum所有列的总和创建索引时,ValueError: cannot reindex from a duplicate axis出现错误。我创建了一个具有相同特征的小型 DataFrame,但无法重现问题,我会遗漏什么?

I don't really understand what ValueError: cannot reindex from a duplicate axismeans, what does this error message mean? Maybe this will help me diagnose the problem, and this is most answerable part of my question.

我真的不明白什么ValueError: cannot reindex from a duplicate axis意思,这个错误信息是什么意思?也许这会帮助我诊断问题,这是我问题中最容易回答的部分。

ipdb> type(affinity_matrix)
<class 'pandas.core.frame.DataFrame'>
ipdb> affinity_matrix.shape
(333, 10)
ipdb> affinity_matrix.columns
Int64Index([9315684, 9315597, 9316591, 9320520, 9321163, 9320615, 9321187, 9319487, 9319467, 9320484], dtype='int64')
ipdb> affinity_matrix.index
Index([u'001', u'002', u'003', u'004', u'005', u'008', u'009', u'010', u'011', u'014', u'015', u'016', u'018', u'020', u'021', u'022', u'024', u'025', u'026', u'027', u'028', u'029', u'030', u'032', u'033', u'034', u'035', u'036', u'039', u'040', u'041', u'042', u'043', u'044', u'045', u'047', u'047', u'048', u'050', u'053', u'054', u'055', u'056', u'057', u'058', u'059', u'060', u'061', u'062', u'063', u'065', u'067', u'068', u'069', u'070', u'071', u'072', u'073', u'074', u'075', u'076', u'077', u'078', u'080', u'082', u'083', u'084', u'085', u'086', u'089', u'090', u'091', u'092', u'093', u'094', u'095', u'096', u'097', u'098', u'100', u'101', u'103', u'104', u'105', u'106', u'107', u'108', u'109', u'110', u'111', u'112', u'113', u'114', u'115', u'116', u'117', u'118', u'119', u'121', u'122', ...], dtype='object')

ipdb> affinity_matrix.values.dtype
dtype('float64')
ipdb> 'sums' in affinity_matrix.index
False

Here is the error:

这是错误:

ipdb> affinity_matrix.loc['sums'] = affinity_matrix.sum(axis=0)
*** ValueError: cannot reindex from a duplicate axis

I tried to reproduce this with a simple example, but I failed

我试图用一个简单的例子重现这个,但我失败了

In [32]: import pandas as pd

In [33]: import numpy as np

In [34]: a = np.arange(35).reshape(5,7)

In [35]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [36]: df.values.dtype
Out[36]: dtype('int64')

In [37]: df.loc['sums'] = df.sum(axis=0)

In [38]: df
Out[38]: 
      10  11  12  13  14  15   16
x      0   1   2   3   4   5    6
y      7   8   9  10  11  12   13
u     14  15  16  17  18  19   20
z     21  22  23  24  25  26   27
w     28  29  30  31  32  33   34
sums  70  75  80  85  90  95  100

采纳答案by Korem

This error usually rises when you join / assign to a column when the index has duplicate values. Since you are assigning to a row, I suspect that there is a duplicate value in affinity_matrix.columns, perhaps not shown in your question.

当索引具有重复值时加入/分配到列时,通常会出现此错误。由于您要分配给一行,我怀疑 中存在重复值affinity_matrix.columns,可能未在您的问题中显示。

回答by Matthew

As others have said, you've probably got duplicate values in your original index. To find them do this:

正如其他人所说,您的原始索引中可能有重复的值。要找到它们,请执行以下操作:

df[df.index.duplicated()]

df[df.index.duplicated()]

回答by Rebeku

Indices with duplicate values often arise if you create a DataFrame by concatenating other DataFrames. IF you don't care about preserving the values of your index, and you want them to be unique values, when you concatenate the the data, set ignore_index=True.

如果您通过连接其他 DataFrame 来创建 DataFrame,则经常会出现具有重复值的索引。如果您不关心保留索引的值,并且希望它们是唯一值,那么在连接数据时,请设置ignore_index=True.

Alternatively, to overwrite your current index with a new one, instead of using df.reindex(), set:

或者,要使用新索引覆盖当前索引,而不是使用df.reindex(),请设置:

df.index = new_index

回答by GoingMyWay

I came across this error today when I wanted to add a new column like this

今天我想添加这样的新列时遇到了这个错误

df_temp['REMARK_TYPE'] = df.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)

I wanted to process the REMARKcolumn of df_tempto return 1 or 0. However I typed wrong variable with df. And it returned error like this:

我想处理返回 1 或 0的REMARK列。df_temp但是我用df. 它返回这样的错误:

----> 1 df_temp['REMARK_TYPE'] = df.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)

/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2417         else:
   2418             # set column
-> 2419             self._set_item(key, value)
   2420 
   2421     def _setitem_slice(self, key, value):

/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2483 
   2484         self._ensure_valid_index(value)
-> 2485         value = self._sanitize_column(key, value)
   2486         NDFrame._set_item(self, key, value)
   2487 

/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value, broadcast)
   2633 
   2634         if isinstance(value, Series):
-> 2635             value = reindexer(value)
   2636 
   2637         elif isinstance(value, DataFrame):

/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in reindexer(value)
   2625                     # duplicate axis
   2626                     if not value.index.is_unique:
-> 2627                         raise e
   2628 
   2629                     # other

ValueError: cannot reindex from a duplicate axis

As you can see it, the right code should be

如您所见,正确的代码应该是

df_temp['REMARK_TYPE'] = df_temp.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)

Because dfand df_temphave a different number of rows. So it returned ValueError: cannot reindex from a duplicate axis.

因为dfdf_temp有不同的行数。所以它回来了ValueError: cannot reindex from a duplicate axis

Hope you can understand it and my answer can help other people to debug their code.

希望你能理解它,我的回答可以帮助其他人调试他们的代码。

回答by tehfink

In my case, this error popped up not because of duplicate values, but because I attempted to join a shorter Series to a Dataframe: both had the same index, but the Series had fewer rows (missing the top few). The following worked for my purposes:

就我而言,出现此错误不是因为重复值,而是因为我试图将较短的系列连接到数据帧:两者都有相同的索引,但系列的行数较少(缺少前几行)。以下为我的目的工作:

df.head()
                          SensA
date                           
2018-04-03 13:54:47.274   -0.45
2018-04-03 13:55:46.484   -0.42
2018-04-03 13:56:56.235   -0.37
2018-04-03 13:57:57.207   -0.34
2018-04-03 13:59:34.636   -0.33

series.head()
date
2018-04-03 14:09:36.577    62.2
2018-04-03 14:10:28.138    63.5
2018-04-03 14:11:27.400    63.1
2018-04-03 14:12:39.623    62.6
2018-04-03 14:13:27.310    62.5
Name: SensA_rrT, dtype: float64

df = series.to_frame().combine_first(df)

df.head(10)
                          SensA  SensA_rrT
date                           
2018-04-03 13:54:47.274   -0.45        NaN
2018-04-03 13:55:46.484   -0.42        NaN
2018-04-03 13:56:56.235   -0.37        NaN
2018-04-03 13:57:57.207   -0.34        NaN
2018-04-03 13:59:34.636   -0.33        NaN
2018-04-03 14:00:34.565   -0.33        NaN
2018-04-03 14:01:19.994   -0.37        NaN
2018-04-03 14:02:29.636   -0.34        NaN
2018-04-03 14:03:31.599   -0.32        NaN
2018-04-03 14:04:30.779   -0.33        NaN
2018-04-03 14:05:31.733   -0.35        NaN
2018-04-03 14:06:33.290   -0.38        NaN
2018-04-03 14:07:37.459   -0.39        NaN
2018-04-03 14:08:36.361   -0.36        NaN
2018-04-03 14:09:36.577   -0.37       62.2

回答by Hadij

Simply skip the error using .valuesat the end.

只需跳过最后使用的错误.values

affinity_matrix.loc['sums'] = affinity_matrix.sum(axis=0).values

回答by Parseltongue

For people who are still struggling with this error, it can also happen if you accidentally create a duplicate column with the same name. Remove duplicate columns like so:

对于仍在为这个错误而苦苦挣扎的人来说,如果您不小心创建了具有相同名称的重复列,也会发生这种情况。像这样删除重复的列:

df = df.loc[:,~df.columns.duplicated()]

回答by rishi jain

I wasted couple of hours on the same issue. In my case, I had to reset_index()of a dataframe before using apply function. Before merging, or looking up from another indexed dataset, you need to reset the index as 1 dataset can have only 1 Index.

我在同一个问题上浪费了几个小时。就我而言,在使用应用函数之前,我必须重置数据帧的索引()。在合并或从另一个索引数据集查找之前,您需要将索引重置为 1 个数据集只能有 1 个索引。

回答by Connor

Simple Fix that Worked for Me

对我有用的简单修复

Run df.reset_index(inplace=True)before grouping.

df.reset_index(inplace=True)分组前运行。

Thank you to this github commentfor the solution.

感谢您对此 github 评论的解决方案。