pandas 如果总和小于 x,则删除熊猫 DataFrame 中的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33990495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:17:23  来源:igfitidea点击:

Delete a column in a pandas' DataFrame if its sum is less than x

pythonpython-2.7pandas

提问by Joey Allen

I am trying to create a program that will delete a column in a Panda's dataFrame if the column's sum is less than 10.

我正在尝试创建一个程序,如果列的总和小于 10,它将删除 Panda 数据框中的列。

I currently have the following solution, but I was curious if there is a more pythonic way to do this.

我目前有以下解决方案,但我很好奇是否有更 Pythonic 的方法来做到这一点。

df = pandas.DataFrame(AllData)
sum = df.sum(axis=1)
badCols = list()
for index in range(len(sum)):
    if sum[index]  < 10:
        badCols.append(index)
df = df.drop(df.columns[badCols], axis=1)

In my approach, I create a list of column indexes that have sums less than 10, then I delete this list. Is there a better approach for doing this?

在我的方法中,我创建了一个总和小于 10 的列索引列表,然后我删除了这个列表。有没有更好的方法来做到这一点?

回答by EdChum

You can call sumto generate a Seriesthat gives the sum of each column, then use this to generate a boolean mask against your column array and use this to filter the df. DF generation code borrowedfrom @Alexander:

您可以调用sum生成一个Series给出每列总和的值,然后使用它来针对您的列数组生成一个布尔掩码,并使用它来过滤 df。从@Alexander借用的DF 生成代码:

In [2]:
df = pd.DataFrame({'a': [1, 10], 'b': [1, 1], 'c': [20, 30]})
df

Out[2]:
    a  b   c
0   1  1  20
1  10  1  30

In [3]:    
df.sum()

Out[3]:
a    11
b     2
c    50
dtype: int64

In [6]:
df[df.columns[df.sum()>10]]

Out[6]:
    a   c
0   1  20
1  10  30

回答by Alexander

You can accomplish your objective using a one-liner by using a list comprehension and iteritemsto identify all columns that meet your criteria.

您可以通过使用列表理解并iteritems确定所有符合您的条件的列来使用单行来实现您的目标。

df = pd.DataFrame({'a': [1, 10], 'b': [1, 1], 'c': [20, 30]})
>>> df
    a  b   c
0   1  1  20
1  10  1  30

df.drop([col for col, val in df.sum().iteritems() if val < 10], axis=1, inplace=True)

>>> df
    a   c
0   1  20
1  10  30