pandas 如果总和小于 x,则删除熊猫 DataFrame 中的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33990495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Delete a column in a pandas' DataFrame if its sum is less than x
提问by Joey Allen
I am trying to create a program that will delete a column in a Panda's dataFrame if the column's sum is less than 10.
我正在尝试创建一个程序,如果列的总和小于 10,它将删除 Panda 数据框中的列。
I currently have the following solution, but I was curious if there is a more pythonic way to do this.
我目前有以下解决方案,但我很好奇是否有更 Pythonic 的方法来做到这一点。
df = pandas.DataFrame(AllData)
sum = df.sum(axis=1)
badCols = list()
for index in range(len(sum)):
if sum[index] < 10:
badCols.append(index)
df = df.drop(df.columns[badCols], axis=1)
In my approach, I create a list of column indexes that have sums less than 10, then I delete this list. Is there a better approach for doing this?
在我的方法中,我创建了一个总和小于 10 的列索引列表,然后我删除了这个列表。有没有更好的方法来做到这一点?
回答by EdChum
You can call sum
to generate a Series
that gives the sum of each column, then use this to generate a boolean mask against your column array and use this to filter the df. DF generation code borrowedfrom @Alexander:
您可以调用sum
生成一个Series
给出每列总和的值,然后使用它来针对您的列数组生成一个布尔掩码,并使用它来过滤 df。从@Alexander借用的DF 生成代码:
In [2]:
df = pd.DataFrame({'a': [1, 10], 'b': [1, 1], 'c': [20, 30]})
df
Out[2]:
a b c
0 1 1 20
1 10 1 30
In [3]:
df.sum()
Out[3]:
a 11
b 2
c 50
dtype: int64
In [6]:
df[df.columns[df.sum()>10]]
Out[6]:
a c
0 1 20
1 10 30
回答by Alexander
You can accomplish your objective using a one-liner by using a list comprehension and iteritems
to identify all columns that meet your criteria.
您可以通过使用列表理解并iteritems
确定所有符合您的条件的列来使用单行来实现您的目标。
df = pd.DataFrame({'a': [1, 10], 'b': [1, 1], 'c': [20, 30]})
>>> df
a b c
0 1 1 20
1 10 1 30
df.drop([col for col, val in df.sum().iteritems() if val < 10], axis=1, inplace=True)
>>> df
a c
0 1 20
1 10 30