pandas ValueError:无法从重复的轴重新索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35257743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: cannot reindex from a duplicate axis
提问by ytk
Let's say I have two dataframes:
假设我有两个数据框:
import string
import pandas as pd
d = {'one': pd.Series(range(26), index = list(string.ascii_lowercase)),
'two': pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
d2 = {'one': pd.Series(range(10), index = range(11, 21))}
df2 = pd.DataFrame(d2)
Now, I have a list of indices:
现在,我有一个索引列表:
np.random.seed(12)
i = np.random.choice(np.arange(11, 21), size = 26)
Now I want to join df2
with df1
based on i
.
现在,我想加入df2
与df1
基础i
。
df['new_col'] = df2['one'][i]
But I get the above mentioned error. One way to work around this is to add i
directly to df1
, and create a column called i
in df2
to represent the index
, and then do a merge
but it seems very inefficient. Is there a better way to do this?
但是我得到了上面提到的错误。要解决的一个办法是增加i
直接df1
,并创建一个名为列i
的df2
代表index
,然后做了merge
,但它似乎非常低效。有一个更好的方法吗?
I know there are a few questions with the same title, but none of them had anything helpful for my case.
我知道有几个问题具有相同的标题,但没有一个对我的案例有帮助。
回答by Anton Protopopov
You could use tolist
method to convert your df2.one
to list and then assign it to df['new_col']
:
您可以使用tolist
方法将您df2.one
的列表转换为列表,然后将其分配给df['new_col']
:
df['new_col'] = df2['one'][i].tolist()
EDIT
编辑
Or you could use .values
attribute as @ajcr suggested in the comment which is faster:
或者您可以使用.values
属性作为 @ajcr 在评论中建议的更快:
df['new_col'] = df2['one'][i].values
Timing
定时
In [100]: %timeit df2.one[i].tolist()
1000 loops, best of 3: 275 μs per loop
In [101]: %timeit df2.one[i].values
1000 loops, best of 3: 252 μs per loop
回答by Brian Huey
Set the index to use the values generated in 'i', then join df2 to df based on that index:
设置索引以使用在 'i' 中生成的值,然后根据该索引将 df2 连接到 df:
df = df.set_index(i)
df['new_col'] = df2['one']