Python Pandas 如何过滤一个系列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28272137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:01:53  来源:igfitidea点击:

Pandas How to filter a Series

pythonpandas

提问by Kiem Nguyen

I have a Series like this after doing groupby('name') and used mean() function on other column

在执行 groupby('name') 并在其他列上使用 mean() 函数后,我有一个这样的系列

name
383      3.000000
663      1.000000
726      1.000000
737      9.000000
833      8.166667

Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.

谁能告诉我如何过滤掉平均值为 1.000000 的行?谢谢你,我非常感谢你的帮助。

采纳答案by Andrew

In [5]:

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383    3.000000
737    9.000000
833    8.166667
dtype: float64

回答by Kamil Sindi

Another way is to first convert to a DataFrame and use the querymethod (assuming you have numexpr installed):

另一种方法是首先转换为DataFrame并使用查询方法(假设您安装了numexpr):

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s.to_frame(name='x').query("x != 1")

回答by DACW

From pandas version 0.18+ filtering a series can also be done as below

从熊猫版本 0.18+ 过滤一系列也可以完成如下

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

pd.Series(test).where(lambda x : x!=1).dropna()

Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

结帐:http: //pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

回答by Gordon Bean

As DACW pointed out, there are method-chaining improvementsin pandas 0.18.1 that do what you are looking for very nicely.

正如DACW 指出的那样,pandas 0.18.1中的方法链改进可以很好地满足您的需求。

Rather than using .where, you can pass your function to either the .locindexer or the Series indexer []and avoid the call to .dropna:

.where您可以将函数传递给.loc索引器或系列索引器[],而不是使用,并避免调用.dropna

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.loc[lambda x : x!=1]

test[lambda x: x!=1]

Similar behavior is supported on the DataFrame and NDFrame classes.

DataFrame 和 NDFrame 类支持类似的行为。

回答by Psidom

If you like a chained operation, you can also use compressfunction:

如果你喜欢链式操作,你也可以使用compress函数:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.compress(lambda x: x != 1)

# 383    3.000000
# 737    9.000000
# 833    8.166667
# dtype: float64

回答by piRSquared

A fast way of doing this is to reconstruct using numpyto slice the underlying arrays. See timings below.

这样做的一种快速方法是使用numpy切片底层数组进行重构。请参阅下面的时间。

mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])

0
383    3.000000
737    9.000000
833    8.166667
dtype: float64

naive timing

天真的时机

enter image description here

在此处输入图片说明

回答by The Red Pea

In my case I had a panda Series where the values are tuples of characters:

就我而言,我有一个熊猫系列,其中值是字符元组

Out[67]
0    (H, H, H, H)
1    (H, H, H, T)
2    (H, H, T, H)
3    (H, H, T, T)
4    (H, T, H, H)

Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

因此,我可以使用索引来过滤系列,但要创建我需要的索引apply。我的条件是“找到所有正好有一个‘H’的元组”。

series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]

I admit it is not "chainable", (i.e. notice I repeat series_of_tuplestwice; you must store any temporary series into a variable so you can call apply(...) on it).

我承认它不是“可链接的”,(即注意我重复了series_of_tuples两次;您必须将任何临时系列存储到一个变量中,以便您可以对其调用 apply(...) )。

There may also be other methods(besides .apply(...)) which can operate elementwise to produce a Boolean index.

可能还有其他方法(除了.apply(...))可以按元素操作以生成布尔索引。

Many other answers (including accepted answer) using the chainable functions like:

使用可链接函数的许多其他答案(包括已接受的答案),例如:

  • .compress()
  • .where()
  • .loc[]
  • []
  • .compress()
  • .where()
  • .loc[]
  • []

These accept callables (lambdas) which are applied to the Series, not to the individual valuesin those series!

这些接受应用于 Series 的可调用对象(lambdas),而不是这些系列中的单个

Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

因此,当我尝试将上述条件/可调用/lambda 与任何可链接函数一起使用时,我的一系列元组表现得很奇怪,例如.loc[]

series_of_tuples.loc[lambda x: x.count('H')==1]

Produces the error:

产生错误:

KeyError: 'Level H must be same as name (None)'

KeyError:'级别 H 必须与名称相同(无)'

I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)function, which is not what I wanted.

我很困惑,但它似乎正在使用Series.countseries_of_tuples.count(...)函数,这不是我想要的。

I admit that an alternative data structure may be better:

我承认另一种数据结构可能更好:

  • A Category datatype?
  • A Dataframe (each element of the tuple becomes a column)
  • A Series of strings (just concatenate the tuples together):
  • 类别数据类型?
  • 一个数据框(元组的每个元素都变成一列)
  • 一系列字符串(只需将元组连接在一起):

This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

这将创建一系列字符串(即通过连接元组;将元组中的字符连接到单个字符串上)

series_of_tuples.apply(''.join)

So I can then use the chainable Series.str.count

所以我可以使用chainableSeries.str.count

series_of_tuples.apply(''.join).str.count('H')==1