Python 错误“只能比较标记相同的系列对象”和 sort_index
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44773017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error"Can only compare identically-labeled Series objects" and sort_index
提问by Lumos
I have two dataframes df1
df2
with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choice
in the two dataframes. Then use if/else
to manipulate the data. But something seems wrong when I try to compare the boolean var.
我有两个df1
df2
具有相同行数和列数以及变量的数据框,我正在尝试比较choice
两个数据框中的布尔变量。然后if/else
用来操作数据。但是当我尝试比较布尔变量时似乎有些错误。
Here are my dataframes sample and codes:
这是我的数据帧示例和代码:
#df1
v_100 choice #boolean
7 True
0 True
7 False
2 True
#df2
v_100 choice #boolean
1 False
2 True
74 True
6 True
def lastTwoTrials_outcome():
df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
df2 = df.iloc[4::6, :]
if df1['choice'] != df2['choice']: # if "choice" is different in the two dataframes
df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5
Here's the error:
这是错误:
if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects
I found the same error here, and an answer suggests to sort_index
first, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?
我在这里发现了同样的错误,一个答案建议sort_index
首先,但我真的不明白为什么?谁能更详细地解释一下(如果这是正确的解决方案)?
Thanks!
谢谢!
采纳答案by jezrael
I think you need reset_index
for same index values and then comapare - for create new column is better use mask
or numpy.where
:
我认为你需要reset_index
相同的索引值,然后comapare - 对于创建新列更好地使用mask
或numpy.where
:
Also instead +
use |
because working with booleans.
也改为+
使用|
因为使用布尔值。
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)
df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])
Samples:
样品:
print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True
print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True
print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)
print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True
回答by Ruslan S.
The error happens because you compare two pandas.Series objects with different indices. A simple solution could be to compare just the values in the series. Try it:
发生错误是因为您比较了两个具有不同索引的 pandas.Series 对象。一个简单的解决方案可能是只比较系列中的值。尝试一下:
if df1['choice'].values != df2['choice'].values