Python 使用布尔系列/数组从熊猫数据框中选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37362984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select from pandas dataframe using boolean series/array
提问by Osora
I have a dataframe:
我有一个数据框:
High Low Close
Date
2009-02-11 30.20 29.41 29.87
2009-02-12 30.28 29.32 30.24
2009-02-13 30.45 29.96 30.10
2009-02-17 29.35 28.74 28.90
2009-02-18 29.35 28.56 28.92
and a boolean series:
和一个布尔系列:
bools
1 True
2 False
3 False
4 True
5 False
how could I select from the dataframe using the boolean array to obtain result like:
我如何使用布尔数组从数据框中进行选择以获得如下结果:
High
Date
2009-02-11 30.20
2009-02-17 29.35
回答by mfitzp
For the indexing to work with two DataFrames they have to have comparable indexes. In this case it won't work because one DataFrame
has an integer index, while the other has dates.
要使索引与两个 DataFrame 一起使用,它们必须具有可比较的索引。在这种情况下,它不起作用,因为一个DataFrame
有整数索引,而另一个有日期。
However, as you say you canfilter using a bool
array. You can access the array for a Series
via .values
. This can be then applied as a filter as follows:
但是,正如您所说,您可以使用bool
array进行过滤。您可以访问数组以获取Series
via .values
。然后可以将其用作过滤器,如下所示:
df # pandas.DataFrame
s # pandas.Series
df[s.values] # df, filtered by the bool array in s
For example, with your data:
例如,使用您的数据:
import pandas as pd
df = pd.DataFrame([
[30.20, 29.41, 29.87],
[30.28, 29.32, 30.24],
[30.45, 29.96, 30.10],
[29.35, 28.74, 28.90],
[29.35, 28.56, 28.92],
],
columns=['High','Low','Close'],
index=['2009-02-11','2009-02-12','2009-02-13','2009-02-17','2009-02-18']
)
s = pd.Series([True, False, False, True, False], name='bools')
df[s.values]
Returns the following:
返回以下内容:
High Low Close
2009-02-11 30.20 29.41 29.87
2009-02-17 29.35 28.74 28.90
If you just want the High column, you can filter this as normal (before, or after the bool
filter):
如果你只想要高列,你可以正常过滤它(过滤器之前或之后bool
):
df['High'][s.values]
# Or: df[s.values]['High']
To get your target output (as a Series
):
要获得目标输出(作为Series
):
2009-02-11 30.20
2009-02-17 29.35
Name: High, dtype: float64