Python “ValueError:无法从重复轴重新索引”是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27236275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does `ValueError: cannot reindex from a duplicate axis` mean?
提问by Akavall
I am getting a ValueError: cannot reindex from a duplicate axis
when I am trying to set an index to a certain value. I tried to reproduce this with a simple example, but I could not do it.
ValueError: cannot reindex from a duplicate axis
当我尝试将索引设置为某个值时,我得到了一个。我试图用一个简单的例子来重现这个,但我做不到。
Here is my session inside of ipdb
trace. I have a DataFrame with string index, and integer columns, float values. However when I try to create sum
index for sum of all columns I am getting ValueError: cannot reindex from a duplicate axis
error. I created a small DataFrame with the same characteristics, but was not able to reproduce the problem, what could I be missing?
这是我在ipdb
跟踪中的会话。我有一个带有字符串索引、整数列和浮点值的 DataFrame。但是,当我尝试为sum
所有列的总和创建索引时,ValueError: cannot reindex from a duplicate axis
出现错误。我创建了一个具有相同特征的小型 DataFrame,但无法重现问题,我会遗漏什么?
I don't really understand what ValueError: cannot reindex from a duplicate axis
means, what does this error message mean? Maybe this will help me diagnose the problem, and this is most answerable part of my question.
我真的不明白什么ValueError: cannot reindex from a duplicate axis
意思,这个错误信息是什么意思?也许这会帮助我诊断问题,这是我问题中最容易回答的部分。
ipdb> type(affinity_matrix)
<class 'pandas.core.frame.DataFrame'>
ipdb> affinity_matrix.shape
(333, 10)
ipdb> affinity_matrix.columns
Int64Index([9315684, 9315597, 9316591, 9320520, 9321163, 9320615, 9321187, 9319487, 9319467, 9320484], dtype='int64')
ipdb> affinity_matrix.index
Index([u'001', u'002', u'003', u'004', u'005', u'008', u'009', u'010', u'011', u'014', u'015', u'016', u'018', u'020', u'021', u'022', u'024', u'025', u'026', u'027', u'028', u'029', u'030', u'032', u'033', u'034', u'035', u'036', u'039', u'040', u'041', u'042', u'043', u'044', u'045', u'047', u'047', u'048', u'050', u'053', u'054', u'055', u'056', u'057', u'058', u'059', u'060', u'061', u'062', u'063', u'065', u'067', u'068', u'069', u'070', u'071', u'072', u'073', u'074', u'075', u'076', u'077', u'078', u'080', u'082', u'083', u'084', u'085', u'086', u'089', u'090', u'091', u'092', u'093', u'094', u'095', u'096', u'097', u'098', u'100', u'101', u'103', u'104', u'105', u'106', u'107', u'108', u'109', u'110', u'111', u'112', u'113', u'114', u'115', u'116', u'117', u'118', u'119', u'121', u'122', ...], dtype='object')
ipdb> affinity_matrix.values.dtype
dtype('float64')
ipdb> 'sums' in affinity_matrix.index
False
Here is the error:
这是错误:
ipdb> affinity_matrix.loc['sums'] = affinity_matrix.sum(axis=0)
*** ValueError: cannot reindex from a duplicate axis
I tried to reproduce this with a simple example, but I failed
我试图用一个简单的例子重现这个,但我失败了
In [32]: import pandas as pd
In [33]: import numpy as np
In [34]: a = np.arange(35).reshape(5,7)
In [35]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))
In [36]: df.values.dtype
Out[36]: dtype('int64')
In [37]: df.loc['sums'] = df.sum(axis=0)
In [38]: df
Out[38]:
10 11 12 13 14 15 16
x 0 1 2 3 4 5 6
y 7 8 9 10 11 12 13
u 14 15 16 17 18 19 20
z 21 22 23 24 25 26 27
w 28 29 30 31 32 33 34
sums 70 75 80 85 90 95 100
采纳答案by Korem
This error usually rises when you join / assign to a column when the index has duplicate values. Since you are assigning to a row, I suspect that there is a duplicate value in affinity_matrix.columns
, perhaps not shown in your question.
当索引具有重复值时加入/分配到列时,通常会出现此错误。由于您要分配给一行,我怀疑 中存在重复值affinity_matrix.columns
,可能未在您的问题中显示。
回答by Matthew
As others have said, you've probably got duplicate values in your original index. To find them do this:
正如其他人所说,您的原始索引中可能有重复的值。要找到它们,请执行以下操作:
df[df.index.duplicated()]
df[df.index.duplicated()]
回答by Rebeku
Indices with duplicate values often arise if you create a DataFrame by concatenating other DataFrames. IF you don't care about preserving the values of your index, and you want them to be unique values, when you concatenate the the data, set ignore_index=True
.
如果您通过连接其他 DataFrame 来创建 DataFrame,则经常会出现具有重复值的索引。如果您不关心保留索引的值,并且希望它们是唯一值,那么在连接数据时,请设置ignore_index=True
.
Alternatively, to overwrite your current index with a new one, instead of using df.reindex()
, set:
或者,要使用新索引覆盖当前索引,而不是使用df.reindex()
,请设置:
df.index = new_index
回答by GoingMyWay
I came across this error today when I wanted to add a new column like this
今天我想添加这样的新列时遇到了这个错误
df_temp['REMARK_TYPE'] = df.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)
I wanted to process the REMARK
column of df_temp
to return 1 or 0. However I typed wrong variable with df
. And it returned error like this:
我想处理返回 1 或 0的REMARK
列。df_temp
但是我用df
. 它返回这样的错误:
----> 1 df_temp['REMARK_TYPE'] = df.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)
/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
2417 else:
2418 # set column
-> 2419 self._set_item(key, value)
2420
2421 def _setitem_slice(self, key, value):
/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
2483
2484 self._ensure_valid_index(value)
-> 2485 value = self._sanitize_column(key, value)
2486 NDFrame._set_item(self, key, value)
2487
/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value, broadcast)
2633
2634 if isinstance(value, Series):
-> 2635 value = reindexer(value)
2636
2637 elif isinstance(value, DataFrame):
/usr/lib64/python2.7/site-packages/pandas/core/frame.pyc in reindexer(value)
2625 # duplicate axis
2626 if not value.index.is_unique:
-> 2627 raise e
2628
2629 # other
ValueError: cannot reindex from a duplicate axis
As you can see it, the right code should be
如您所见,正确的代码应该是
df_temp['REMARK_TYPE'] = df_temp.REMARK.apply(lambda v: 1 if str(v)!='nan' else 0)
Because df
and df_temp
have a different number of rows. So it returned ValueError: cannot reindex from a duplicate axis
.
因为df
和df_temp
有不同的行数。所以它回来了ValueError: cannot reindex from a duplicate axis
。
Hope you can understand it and my answer can help other people to debug their code.
希望你能理解它,我的回答可以帮助其他人调试他们的代码。
回答by tehfink
In my case, this error popped up not because of duplicate values, but because I attempted to join a shorter Series to a Dataframe: both had the same index, but the Series had fewer rows (missing the top few). The following worked for my purposes:
就我而言,出现此错误不是因为重复值,而是因为我试图将较短的系列连接到数据帧:两者都有相同的索引,但系列的行数较少(缺少前几行)。以下为我的目的工作:
df.head()
SensA
date
2018-04-03 13:54:47.274 -0.45
2018-04-03 13:55:46.484 -0.42
2018-04-03 13:56:56.235 -0.37
2018-04-03 13:57:57.207 -0.34
2018-04-03 13:59:34.636 -0.33
series.head()
date
2018-04-03 14:09:36.577 62.2
2018-04-03 14:10:28.138 63.5
2018-04-03 14:11:27.400 63.1
2018-04-03 14:12:39.623 62.6
2018-04-03 14:13:27.310 62.5
Name: SensA_rrT, dtype: float64
df = series.to_frame().combine_first(df)
df.head(10)
SensA SensA_rrT
date
2018-04-03 13:54:47.274 -0.45 NaN
2018-04-03 13:55:46.484 -0.42 NaN
2018-04-03 13:56:56.235 -0.37 NaN
2018-04-03 13:57:57.207 -0.34 NaN
2018-04-03 13:59:34.636 -0.33 NaN
2018-04-03 14:00:34.565 -0.33 NaN
2018-04-03 14:01:19.994 -0.37 NaN
2018-04-03 14:02:29.636 -0.34 NaN
2018-04-03 14:03:31.599 -0.32 NaN
2018-04-03 14:04:30.779 -0.33 NaN
2018-04-03 14:05:31.733 -0.35 NaN
2018-04-03 14:06:33.290 -0.38 NaN
2018-04-03 14:07:37.459 -0.39 NaN
2018-04-03 14:08:36.361 -0.36 NaN
2018-04-03 14:09:36.577 -0.37 62.2
回答by Hadij
Simply skip the error using .values
at the end.
只需跳过最后使用的错误.values
。
affinity_matrix.loc['sums'] = affinity_matrix.sum(axis=0).values
回答by Parseltongue
For people who are still struggling with this error, it can also happen if you accidentally create a duplicate column with the same name. Remove duplicate columns like so:
对于仍在为这个错误而苦苦挣扎的人来说,如果您不小心创建了具有相同名称的重复列,也会发生这种情况。像这样删除重复的列:
df = df.loc[:,~df.columns.duplicated()]
回答by rishi jain
I wasted couple of hours on the same issue. In my case, I had to reset_index()of a dataframe before using apply function. Before merging, or looking up from another indexed dataset, you need to reset the index as 1 dataset can have only 1 Index.
我在同一个问题上浪费了几个小时。就我而言,在使用应用函数之前,我必须重置数据帧的索引()。在合并或从另一个索引数据集查找之前,您需要将索引重置为 1 个数据集只能有 1 个索引。
回答by Connor
Simple Fix that Worked for Me
对我有用的简单修复
Run df.reset_index(inplace=True)
before grouping.
df.reset_index(inplace=True)
分组前运行。
Thank you to this github commentfor the solution.
感谢您对此 github 评论的解决方案。