Python 错误“只能比较标记相同的系列对象”和 sort_index

Question

提问by Lumos

I have two dataframes df1df2with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choicein the two dataframes. Then use if/elseto manipulate the data. But something seems wrong when I try to compare the boolean var.

我有两个df1df2具有相同行数和列数以及变量的数据框，我正在尝试比较choice两个数据框中的布尔变量。然后if/else用来操作数据。但是当我尝试比较布尔变量时似乎有些错误。

Here are my dataframes sample and codes:

这是我的数据帧示例和代码：

#df1
v_100     choice #boolean
7          True
0          True
7          False
2          True

#df2
v_100     choice #boolean
1          False
2          True
74         True
6          True

def lastTwoTrials_outcome():
     df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
     df2 = df.iloc[4::6, :]

     if df1['choice'] != df2['choice']:  # if "choice" is different in the two dataframes
         df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5

Here's the error:

这是错误：

if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects

I found the same error here, and an answer suggests to sort_indexfirst, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?

我在这里发现了同样的错误，一个答案建议sort_index首先，但我真的不明白为什么？谁能更详细地解释一下（如果这是正确的解决方案）？

Thanks!

谢谢！

Answer 1

采纳答案by jezrael

I think you need reset_indexfor same index values and then comapare - for create new column is better use maskor numpy.where:

我认为你需要reset_index相同的索引值，然后comapare - 对于创建新列更好地使用mask或numpy.where：

Also instead +use |because working with booleans.

也改为+使用|因为使用布尔值。

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] + df2['choice']) * 0.5)


df1['v_100'] = np.where(df1['choice'] != df2['choice'],
                       (df1['choice'] | df2['choice']) * 0.5,
                        df1['choice'])

Samples:

样品：

print (df1)
   v_100  choice
5      7    True
6      0    True
7      7   False
8      2    True

print (df2)
   v_100  choice
4      1   False
5      2    True
6     74    True
7      6    True

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
   v_100  choice
0      7    True
1      0    True
2      7   False
3      2    True

print (df2)
   v_100  choice
0      1   False
1      2    True
2     74    True
3      6    True

df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] | df2['choice']) * 0.5)

print (df1)
   v_100  choice
0    0.5    True
1    1.0    True
2    0.5   False
3    1.0    True

Answer 2

回答by Ruslan S.

The error happens because you compare two pandas.Series objects with different indices. A simple solution could be to compare just the values in the series. Try it:

发生错误是因为您比较了两个具有不同索引的 pandas.Series 对象。一个简单的解决方案可能是只比较系列中的值。尝试一下：

if df1['choice'].values != df2['choice'].values

Python 错误“只能比较标记相同的系列对象”和 sort_index

提问by Lumos

采纳答案by jezrael

回答by Ruslan S.

相关推荐

最近更新

标签

Python 错误“只能比较标记相同的系列对象”和 sort_index

提问by Lumos

采纳答案by jezrael

回答by Ruslan S.

相关推荐

Python 使用 Pandas 读取数据（.dat 文件）

如何在 Python 中很好地打印字典？

Python 将二维数组写入带有分隔符的 csv 文件

Python Pyspark：将多个数组列拆分为行

相关推荐

最近更新

标签