pandas ValueError：无法从重复的轴重新索引

Question

提问by ytk

Let's say I have two dataframes:

假设我有两个数据框：

import string
import pandas as pd

d = {'one': pd.Series(range(26), index = list(string.ascii_lowercase)),
     'two': pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

d2 = {'one': pd.Series(range(10), index = range(11, 21))}
df2 = pd.DataFrame(d2)

Now, I have a list of indices:

现在，我有一个索引列表：

np.random.seed(12)
i = np.random.choice(np.arange(11, 21), size = 26)

Now I want to join df2with df1based on i.

现在，我想加入df2与df1基础i。

df['new_col'] = df2['one'][i]

But I get the above mentioned error. One way to work around this is to add idirectly to df1, and create a column called iin df2to represent the index, and then do a mergebut it seems very inefficient. Is there a better way to do this?

但是我得到了上面提到的错误。要解决的一个办法是增加i直接df1，并创建一个名为列i的df2代表index，然后做了merge，但它似乎非常低效。有一个更好的方法吗？

I know there are a few questions with the same title, but none of them had anything helpful for my case.

我知道有几个问题具有相同的标题，但没有一个对我的案例有帮助。

Answer 1

回答by Anton Protopopov

You could use tolistmethod to convert your df2.oneto list and then assign it to df['new_col']:

您可以使用tolist方法将您df2.one的列表转换为列表，然后将其分配给df['new_col']：

df['new_col'] = df2['one'][i].tolist()

EDIT

编辑

Or you could use .valuesattribute as @ajcr suggested in the comment which is faster:

或者您可以使用.values属性作为 @ajcr 在评论中建议的更快：

df['new_col'] = df2['one'][i].values

Timing

定时

In [100]: %timeit df2.one[i].tolist()
1000 loops, best of 3: 275 μs per loop

In [101]: %timeit df2.one[i].values
1000 loops, best of 3: 252 μs per loop

Answer 2

回答by Brian Huey

Set the index to use the values generated in 'i', then join df2 to df based on that index:

设置索引以使用在 'i' 中生成的值，然后根据该索引将 df2 连接到 df：

df = df.set_index(i)
df['new_col'] = df2['one']

pandas ValueError：无法从重复的轴重新索引

提问by ytk

回答by Anton Protopopov

回答by Brian Huey

相关推荐

最近更新

标签

pandas ValueError：无法从重复的轴重新索引

提问by ytk

回答by Anton Protopopov

回答by Brian Huey

相关推荐

pandas Python：这是在熊猫数据框中查找索引的快速方法？

pandas python del 没有释放所有内存

pandas 熊猫连接失败

如何修改函数中的 Pandas DataFrame 以便调用者可以看到更改？

相关推荐

最近更新

标签