pandas ValueError:无法从重复的轴重新索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35257743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:39:05  来源:igfitidea点击:

ValueError: cannot reindex from a duplicate axis

pythonpandas

提问by ytk

Let's say I have two dataframes:

假设我有两个数据框:

import string
import pandas as pd

d = {'one': pd.Series(range(26), index = list(string.ascii_lowercase)),
     'two': pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

d2 = {'one': pd.Series(range(10), index = range(11, 21))}
df2 = pd.DataFrame(d2)

Now, I have a list of indices:

现在,我有一个索引列表:

np.random.seed(12)
i = np.random.choice(np.arange(11, 21), size = 26)

Now I want to join df2with df1based on i.

现在,我想加入df2df1基础i

df['new_col'] = df2['one'][i]

But I get the above mentioned error. One way to work around this is to add idirectly to df1, and create a column called iin df2to represent the index, and then do a mergebut it seems very inefficient. Is there a better way to do this?

但是我得到了上面提到的错误。要解决的一个办法是增加i直接df1,并创建一个名为列idf2代表index,然后做了merge,但它似乎非常低效。有一个更好的方法吗?

I know there are a few questions with the same title, but none of them had anything helpful for my case.

我知道有几个问题具有相同的标题,但没有一个对我的案例有帮助。

回答by Anton Protopopov

You could use tolistmethod to convert your df2.oneto list and then assign it to df['new_col']:

您可以使用tolist方法将您df2.one的列表转换为列表,然后将其分配给df['new_col']

df['new_col'] = df2['one'][i].tolist()

EDIT

编辑

Or you could use .valuesattribute as @ajcr suggested in the comment which is faster:

或者您可以使用.values属性作为 @ajcr 在评论中建议的更快:

df['new_col'] = df2['one'][i].values

Timing

定时

In [100]: %timeit df2.one[i].tolist()
1000 loops, best of 3: 275 μs per loop

In [101]: %timeit df2.one[i].values
1000 loops, best of 3: 252 μs per loop

回答by Brian Huey

Set the index to use the values generated in 'i', then join df2 to df based on that index:

设置索引以使用在 'i' 中生成的值,然后根据该索引将 df2 连接到 df:

df = df.set_index(i)
df['new_col'] = df2['one']