Python 使用布尔系列/数组从熊猫数据框中选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37362984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:17:17  来源:igfitidea点击:

Select from pandas dataframe using boolean series/array

pythondataframe

提问by Osora

I have a dataframe:

我有一个数据框:

             High    Low  Close
Date                           
2009-02-11  30.20  29.41  29.87
2009-02-12  30.28  29.32  30.24
2009-02-13  30.45  29.96  30.10
2009-02-17  29.35  28.74  28.90
2009-02-18  29.35  28.56  28.92

and a boolean series:

和一个布尔系列:

     bools
1    True
2    False
3    False
4    True
5    False

how could I select from the dataframe using the boolean array to obtain result like:

我如何使用布尔数组从数据框中进行选择以获得如下结果:

             High   
Date                           
2009-02-11  30.20  
2009-02-17  29.35  

回答by mfitzp

For the indexing to work with two DataFrames they have to have comparable indexes. In this case it won't work because one DataFramehas an integer index, while the other has dates.

要使索引与两个 DataFrame 一起使用,它们必须具有可比较的索引。在这种情况下,它不起作用,因为一个DataFrame有整数索引,而另一个有日期。

However, as you say you canfilter using a boolarray. You can access the array for a Seriesvia .values. This can be then applied as a filter as follows:

但是,正如您所说,您可以使用boolarray进行过滤。您可以访问数组以获取Seriesvia .values。然后可以将其用作过滤器,如下所示:

df # pandas.DataFrame
s  # pandas.Series 

df[s.values] # df, filtered by the bool array in s

For example, with your data:

例如,使用您的数据:

import pandas as pd

df = pd.DataFrame([
            [30.20,  29.41,  29.87],
            [30.28,  29.32,  30.24],
            [30.45,  29.96,  30.10],
            [29.35,  28.74,  28.90],
            [29.35,  28.56,  28.92],
        ],
        columns=['High','Low','Close'], 
        index=['2009-02-11','2009-02-12','2009-02-13','2009-02-17','2009-02-18']
        )

s = pd.Series([True, False, False, True, False], name='bools')

df[s.values]

Returns the following:

返回以下内容:

            High    Low     Close
2009-02-11  30.20   29.41   29.87
2009-02-17  29.35   28.74   28.90

If you just want the High column, you can filter this as normal (before, or after the boolfilter):

如果你只想要高列,你可以正常过滤它(过滤器之前或之后bool):

df['High'][s.values]
# Or: df[s.values]['High']

To get your target output (as a Series):

要获得目标输出(作为Series):

 2009-02-11    30.20
 2009-02-17    29.35
 Name: High, dtype: float64