Python Pandas 如何过滤一个系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28272137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas How to filter a Series
提问by Kiem Nguyen
I have a Series like this after doing groupby('name') and used mean() function on other column
在执行 groupby('name') 并在其他列上使用 mean() 函数后,我有一个这样的系列
name
383 3.000000
663 1.000000
726 1.000000
737 9.000000
833 8.166667
Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.
谁能告诉我如何过滤掉平均值为 1.000000 的行?谢谢你,我非常感谢你的帮助。
采纳答案by Andrew
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
回答by Kamil Sindi
Another way is to first convert to a DataFrame and use the querymethod (assuming you have numexpr installed):
另一种方法是首先转换为DataFrame并使用查询方法(假设您安装了numexpr):
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s.to_frame(name='x').query("x != 1")
回答by DACW
From pandas version 0.18+ filtering a series can also be done as below
从熊猫版本 0.18+ 过滤一系列也可以完成如下
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
结帐:http: //pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
回答by Gordon Bean
As DACW pointed out, there are method-chaining improvementsin pandas 0.18.1 that do what you are looking for very nicely.
正如DACW 指出的那样,pandas 0.18.1中的方法链改进可以很好地满足您的需求。
Rather than using .where
, you can pass your function to either the .loc
indexer or the Series indexer []
and avoid the call to .dropna
:
.where
您可以将函数传递给.loc
索引器或系列索引器[]
,而不是使用,并避免调用.dropna
:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.loc[lambda x : x!=1]
test[lambda x: x!=1]
Similar behavior is supported on the DataFrame and NDFrame classes.
DataFrame 和 NDFrame 类支持类似的行为。
回答by Psidom
If you like a chained operation, you can also use compress
function:
如果你喜欢链式操作,你也可以使用compress
函数:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.compress(lambda x: x != 1)
# 383 3.000000
# 737 9.000000
# 833 8.166667
# dtype: float64
回答by piRSquared
回答by The Red Pea
In my case I had a panda Series where the values are tuples of characters:
就我而言,我有一个熊猫系列,其中值是字符元组:
Out[67]
0 (H, H, H, H)
1 (H, H, H, T)
2 (H, H, T, H)
3 (H, H, T, T)
4 (H, T, H, H)
Therefore I could use indexing to filter the series, but to create the index I needed apply
. My condition is "find all tuples which have exactly one 'H'".
因此,我可以使用索引来过滤系列,但要创建我需要的索引apply
。我的条件是“找到所有正好有一个‘H’的元组”。
series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
I admit it is not "chainable", (i.e. notice I repeat series_of_tuples
twice; you must store any temporary series into a variable so you can call apply(...) on it).
我承认它不是“可链接的”,(即注意我重复了series_of_tuples
两次;您必须将任何临时系列存储到一个变量中,以便您可以对其调用 apply(...) )。
There may also be other methods(besides .apply(...)
) which can operate elementwise to produce a Boolean index.
可能还有其他方法(除了.apply(...)
)可以按元素操作以生成布尔索引。
Many other answers (including accepted answer) using the chainable functions like:
使用可链接函数的许多其他答案(包括已接受的答案),例如:
.compress()
.where()
.loc[]
[]
.compress()
.where()
.loc[]
[]
These accept callables (lambdas) which are applied to the Series, not to the individual valuesin those series!
这些接受应用于 Series 的可调用对象(lambdas),而不是这些系列中的单个值!
Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]
:
因此,当我尝试将上述条件/可调用/lambda 与任何可链接函数一起使用时,我的一系列元组表现得很奇怪,例如.loc[]
:
series_of_tuples.loc[lambda x: x.count('H')==1]
Produces the error:
产生错误:
KeyError: 'Level H must be same as name (None)'
KeyError:'级别 H 必须与名称相同(无)'
I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)
function, which is not what I wanted.
我很困惑,但它似乎正在使用Series.countseries_of_tuples.count(...)
函数,这不是我想要的。
I admit that an alternative data structure may be better:
我承认另一种数据结构可能更好:
- A Category datatype?
- A Dataframe (each element of the tuple becomes a column)
- A Series of strings (just concatenate the tuples together):
- 类别数据类型?
- 一个数据框(元组的每个元素都变成一列)
- 一系列字符串(只需将元组连接在一起):
This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)
这将创建一系列字符串(即通过连接元组;将元组中的字符连接到单个字符串上)
series_of_tuples.apply(''.join)
So I can then use the chainable Series.str.count
所以我可以使用chainableSeries.str.count
series_of_tuples.apply(''.join).str.count('H')==1