Python Pandas 如何过滤一个系列

Question

提问by Kiem Nguyen

I have a Series like this after doing groupby('name') and used mean() function on other column

在执行 groupby('name') 并在其他列上使用 mean() 函数后，我有一个这样的系列

name
383      3.000000
663      1.000000
726      1.000000
737      9.000000
833      8.166667

Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.

谁能告诉我如何过滤掉平均值为 1.000000 的行？谢谢你，我非常感谢你的帮助。

Answer 1

采纳答案by Andrew

In [5]:

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383    3.000000
737    9.000000
833    8.166667
dtype: float64

Answer 2

回答by Kamil Sindi

Another way is to first convert to a DataFrame and use the querymethod (assuming you have numexpr installed):

另一种方法是首先转换为DataFrame并使用查询方法（假设您安装了numexpr）：

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s.to_frame(name='x').query("x != 1")

Answer 3

回答by DACW

From pandas version 0.18+ filtering a series can also be done as below

从熊猫版本 0.18+ 过滤一系列也可以完成如下

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

pd.Series(test).where(lambda x : x!=1).dropna()

Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

结帐：http: //pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

Answer 4

回答by Gordon Bean

As DACW pointed out, there are method-chaining improvementsin pandas 0.18.1 that do what you are looking for very nicely.

正如DACW 指出的那样，pandas 0.18.1中的方法链改进可以很好地满足您的需求。

Rather than using .where, you can pass your function to either the .locindexer or the Series indexer []and avoid the call to .dropna:

.where您可以将函数传递给.loc索引器或系列索引器[]，而不是使用，并避免调用.dropna：

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.loc[lambda x : x!=1]

test[lambda x: x!=1]

Similar behavior is supported on the DataFrame and NDFrame classes.

DataFrame 和 NDFrame 类支持类似的行为。

Answer 5

回答by Psidom

If you like a chained operation, you can also use compressfunction:

如果你喜欢链式操作，你也可以使用compress函数：

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.compress(lambda x: x != 1)

# 383    3.000000
# 737    9.000000
# 833    8.166667
# dtype: float64

Answer 6

回答by piRSquared

A fast way of doing this is to reconstruct using numpyto slice the underlying arrays. See timings below.

这样做的一种快速方法是使用numpy切片底层数组进行重构。请参阅下面的时间。

mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])

0
383    3.000000
737    9.000000
833    8.166667
dtype: float64

naive timing

天真的时机

Answer 7

回答by The Red Pea

In my case I had a panda Series where the values are tuples of characters:

就我而言，我有一个熊猫系列，其中值是字符元组：

Out[67]
0    (H, H, H, H)
1    (H, H, H, T)
2    (H, H, T, H)
3    (H, H, T, T)
4    (H, T, H, H)

Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

因此，我可以使用索引来过滤系列，但要创建我需要的索引apply。我的条件是“找到所有正好有一个‘H’的元组”。

series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]

I admit it is not "chainable", (i.e. notice I repeat series_of_tuplestwice; you must store any temporary series into a variable so you can call apply(...) on it).

我承认它不是“可链接的”，（即注意我重复了series_of_tuples两次；您必须将任何临时系列存储到一个变量中，以便您可以对其调用 apply(...) ）。

There may also be other methods(besides .apply(...)) which can operate elementwise to produce a Boolean index.

可能还有其他方法（除了.apply(...)）可以按元素操作以生成布尔索引。

Many other answers (including accepted answer) using the chainable functions like:

使用可链接函数的许多其他答案（包括已接受的答案），例如：

.compress()
.where()
.loc[]
[]

.compress()
.where()
.loc[]
[]

These accept callables (lambdas) which are applied to the Series, not to the individual valuesin those series!

这些接受应用于 Series 的可调用对象（lambdas），而不是这些系列中的单个值！

Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

因此，当我尝试将上述条件/可调用/lambda 与任何可链接函数一起使用时，我的一系列元组表现得很奇怪，例如.loc[]：

series_of_tuples.loc[lambda x: x.count('H')==1]

Produces the error:

产生错误：

KeyError: 'Level H must be same as name (None)'

KeyError：'级别 H 必须与名称相同（无）'

I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)function, which is not what I wanted.

我很困惑，但它似乎正在使用Series.countseries_of_tuples.count(...)函数，这不是我想要的。

I admit that an alternative data structure may be better:

我承认另一种数据结构可能更好：

A Category datatype?
A Dataframe (each element of the tuple becomes a column)
A Series of strings (just concatenate the tuples together):

类别数据类型？
一个数据框（元组的每个元素都变成一列）
一系列字符串（只需将元组连接在一起）：

This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

这将创建一系列字符串（即通过连接元组；将元组中的字符连接到单个字符串上）

series_of_tuples.apply(''.join)

So I can then use the chainable Series.str.count

所以我可以使用chainableSeries.str.count

series_of_tuples.apply(''.join).str.count('H')==1

Python Pandas 如何过滤一个系列

提问by Kiem Nguyen

采纳答案by Andrew

回答by Kamil Sindi

回答by DACW

回答by Gordon Bean

回答by Psidom

回答by piRSquared

回答by The Red Pea

相关推荐

最近更新

标签

Python Pandas 如何过滤一个系列

提问by Kiem Nguyen

采纳答案by Andrew

回答by Kamil Sindi

回答by DACW

回答by Gordon Bean

回答by Psidom

回答by piRSquared

回答by The Red Pea

相关推荐

Python 如何使用熊猫对与给定条件匹配的列中的值求和？

Python - Flask：找不到 render_template()

未定义全局变量 - Python

Python NumPy 使用索引列表选择每行特定的列索引

相关推荐

最近更新

标签