pandas isin 熊猫的问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32978362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Problems with isin pandas
提问by qwertylpc
Sorry, I just asked this question: Pythonic Way to have multiple Or's when conditioning in a dataframebut marked it as answered prematurely because it passed my overly simplistic test case, but isn't working more generally. (If it is possible to merge and reopen the question that would be great...)
抱歉,我刚刚问了这个问题: Pythonic Way to have multiple Or's whenconditioning in a dataframe但过早地将其标记为已回答,因为它通过了我过于简单的测试用例,但不能更普遍地工作。(如果可以合并并重新打开这个问题,那就太好了......)
Here is the full issue:
这是完整的问题:
sum(data['Name'].isin(eligible_players))
> 0
sum(data['Name'] == "Antonio Brown")
> 68
"Antonio Brown" in eligible_players
> True
Basically if I understand correctly, I am showing that Antonio Brown is in eligible players and he is in the dataframe. However, for some reason the .isin() isn't working properly.
基本上,如果我理解正确的话,我会展示安东尼奥·布朗在符合条件的球员中并且他在数据框中。但是,由于某种原因, .isin() 不能正常工作。
As I said in my prior question, I am looking for a way to check many ors to select the proper rows
正如我在我之前的问题中所说,我正在寻找一种方法来检查许多 ors 以选择正确的行
____ EDIT ____
____ 编辑 ____
In[14]:
eligible_players
Out[14]:
Name
Antonio Brown 378
Demaryius Thomas 334
Jordy Nelson 319
Dez Bryant 309
Emmanuel Sanders 293
Odell Beckham 289
Julio Jones 288
Randall Cobb 284
Jeremy Maclin 267
T.Y. Hilton 255
Alshon Jeffery 252
Golden Tate 250
Mike Evans 236
DeAndre Hopkins 223
Calvin Johnson 220
Kelvin Benjamin 218
Julian Edelman 213
Anquan Boldin 213
Steve Smith 213
Roddy White 208
Brandon LaFell 205
Mike Wallace 205
A.J. Green 203
DeSean Hymanson 200
Jordan Matthews 194
Eric Decker 194
Sammy Watkins 190
Torrey Smith 186
Andre Johnson 186
Jarvis Landry 178
Eddie Royal 176
Brandon Marshall 175
Vincent Hymanson 175
Rueben Randle 174
Marques Colston 173
Mohamed Sanu 171
Keenan Allen 170
James Jones 168
Malcom Floyd 168
Kenny Stills 167
Greg Jennings 162
Kendall Wright 162
Doug Baldwin 160
Michael Floyd 159
Robert Woods 158
Name: Pts, dtype: int64
and
和
In [31]:
data.tail(110)
Out[31]:
Name Pts year week pos Team
28029 Dez Bryant 25 2014 17 WR DAL
28030 Antonio Brown 25 2014 17 WR PIT
28031 Jordan Matthews 24 2014 17 WR PHI
28032 Randall Cobb 23 2014 17 WR GB
28033 Rueben Randle 21 2014 17 WR NYG
28034 Demaryius Thomas 19 2014 17 WR DEN
28035 Calvin Johnson 19 2014 17 WR DET
28036 Torrey Smith 18 2014 17 WR BAL
28037 Roddy White 17 2014 17 WR ATL
28038 Steve Smith 17 2014 17 WR BAL
28039 DeSean Hymanson 16 2014 17 WR WAS
28040 Mike Evans 16 2014 17 WR TB
28041 Anquan Boldin 16 2014 17 WR SF
28042 Adam Thielen 15 2014 17 WR MIN
28043 Cecil Shorts 15 2014 17 WR JAC
28044 A.J. Green 15 2014 17 WR CIN
28045 Jordy Nelson 14 2014 17 WR GB
28046 Brian Hartline 14 2014 17 WR MIA
28047 Robert Woods 13 2014 17 WR BUF
28048 Kenny Stills 13 2014 17 WR NO
28049 Emmanuel Sanders 13 2014 17 WR DEN
28050 Eddie Royal 13 2014 17 WR SD
28051 Marques Colston 13 2014 17 WR NO
28052 Chris Owusu 12 2014 17 WR NYJ
28053 Brandon LaFell 12 2014 17 WR NE
28054 Dontrelle Inman 12 2014 17 WR SD
28055 Reggie Wayne 11 2014 17 WR IND
28056 Paul Richardson 11 2014 17 WR SEA
28057 Cole Beasley 11 2014 17 WR DAL
28058 Jarvis Landry 10 2014 17 WR MIA
回答by DSM
(Aside: once you posted what you were actually using, it only took seconds to see the problem.)
(旁白:一旦您发布了您实际使用的内容,只需几秒钟就可以看到问题。)
Series.isin(something)iterates over somethingto determine the set of things you want to test membership in. But your eligible_playersisn't a list, it's a Series. And iteration over a Series is iteration over the values, even though membership (in) is with respect to the index:
Series.isin(something)迭代something以确定您想要测试成员资格的一组事物。但您eligible_players不是一个列表,它是一个 Series。对 Series 的迭代是对values 的迭代,即使成员资格 ( in) 是关于索引的:
In [72]: eligible_players = pd.Series([10,20,30], index=["A","B","C"])
In [73]: list(eligible_players)
Out[73]: [10, 20, 30]
In [74]: "A" in eligible_players
Out[74]: True
So in your case, you could use eligible_players.indexinstead to pass the right names:
因此,在您的情况下,您可以使用eligible_players.index来传递正确的名称:
In [75]: df = pd.DataFrame({"Name": ["A","B","C","D"]})
In [76]: df
Out[76]:
Name
0 A
1 B
2 C
3 D
In [77]: df["Name"].isin(eligible_players) # remember, this will be [10, 20, 30]
Out[77]:
0 False
1 False
2 False
3 False
Name: Name, dtype: bool
In [78]: df["Name"].isin(eligible_players.index)
Out[78]:
0 True
1 True
2 True
3 False
Name: Name, dtype: bool
In [79]: df["Name"].isin(eligible_players.index).sum()
Out[79]: 3

