Python 错误“只能比较标记相同的系列对象”和 sort_index
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44773017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error"Can only compare identically-labeled Series objects" and sort_index
提问by Lumos
I have two dataframes df1df2with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choicein the two dataframes. Then use if/elseto manipulate the data. But something seems wrong when I try to compare the boolean var.
我有两个df1df2具有相同行数和列数以及变量的数据框,我正在尝试比较choice两个数据框中的布尔变量。然后if/else用来操作数据。但是当我尝试比较布尔变量时似乎有些错误。
Here are my dataframes sample and codes:
这是我的数据帧示例和代码:
#df1
v_100 choice #boolean
7 True
0 True
7 False
2 True
#df2
v_100 choice #boolean
1 False
2 True
74 True
6 True
def lastTwoTrials_outcome():
df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
df2 = df.iloc[4::6, :]
if df1['choice'] != df2['choice']: # if "choice" is different in the two dataframes
df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5
Here's the error:
这是错误:
if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects
I found the same error here, and an answer suggests to sort_indexfirst, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?
我在这里发现了同样的错误,一个答案建议sort_index首先,但我真的不明白为什么?谁能更详细地解释一下(如果这是正确的解决方案)?
Thanks!
谢谢!
采纳答案by jezrael
I think you need reset_indexfor same index values and then comapare - for create new column is better use maskor numpy.where:
我认为你需要reset_index相同的索引值,然后comapare - 对于创建新列更好地使用mask或numpy.where:
Also instead +use |because working with booleans.
也改为+使用|因为使用布尔值。
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)
df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])
Samples:
样品:
print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True
print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True
print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)
print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True
回答by Ruslan S.
The error happens because you compare two pandas.Series objects with different indices. A simple solution could be to compare just the values in the series. Try it:
发生错误是因为您比较了两个具有不同索引的 pandas.Series 对象。一个简单的解决方案可能是只比较系列中的值。尝试一下:
if df1['choice'].values != df2['choice'].values

