Python Pandas 根据列的最大值删除列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26897536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:39:51  来源:igfitidea点击:

Python Pandas drop columns based on max value of column

pythonnumpypandas

提问by professorDante

Im just getting going with Pandas as a tool for munging two dimensional arrays of data. It's super overwhelming, even after reading the docs. You can do so much that I can't figure out how to do anything, if that makes any sense.

我刚刚开始使用 Pandas 作为处理二维数据数组的工具。即使在阅读了文档之后,它也是超级压倒性的。你可以做的太多,我无法弄清楚如何做任何事情,如果这有意义的话。

My dataframe (simplified):

我的数据框(简化):

Date       Stock1  Stock2   Stock3
2014.10.10  74.75  NaN     NaN
2014.9.9    NaN    100.95  NaN 
2010.8.8    NaN    NaN     120.45

So each column only has one value.

所以每一列只有一个值。

I want to remove all columns that have a max value less than x. So say here as an example, if x = 80, then I want a new DataFrame:

我想删除最大值小于 x 的所有列。所以在这里举个例子,如果 x = 80,那么我想要一个新的 DataFrame:

Date        Stock2   Stock3
2014.10.10   NaN     NaN
2014.9.9     100.95  NaN 
2010.8.8     NaN     120.45

How can this be acheived? I've looked at dataframe.max() which gives me a series. Can I use that, or have a lambda function somehow in select()?

怎样才能做到这一点?我看过 dataframe.max() 这给了我一个系列。我可以使用它,或者在 select() 中有一个 lambda 函数吗?

回答by Adam Hughes

Use the df.max()to index with.

使用df.max()来索引。

In [19]: from pandas import DataFrame

In [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c'])

In [36]: df
Out[36]: 
          a         b         c
0 -0.928912  0.220573  1.948065
1 -0.310504  0.847638 -0.541496
2 -0.743000 -1.099226 -1.183567


In [24]: df.max()
Out[24]: 
a   -0.310504
b    0.847638
c    1.948065
dtype: float64

Next, we make a boolean expression out of this:

接下来,我们从中创建一个布尔表达式:

In [31]: df.max() > 0
Out[31]: 
a    False
b     True
c     True
dtype: bool

Next, you can index df.columns by this (this is called boolean indexing):

接下来,您可以通过此索引 df.columns (这称为布尔索引):

In [34]: df.columns[df.max() > 0]
Out[34]: Index([u'b', u'c'], dtype='object')

Which you can finally pass to DF:

您最终可以传递给 DF:

In [35]: df[df.columns[df.max() > 0]]
Out[35]: 
          b         c
0  0.220573  1.948065
1  0.847638 -0.541496
2 -1.099226 -1.183567

Of course, instead of 0, you use any value that you want as the cutoff for dropping.

当然,您可以使用任何您想要的值作为丢弃的截止值,而不是 0。