Python 如何删除 Pandas 中仅包含零的列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21164910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:12:31  来源:igfitidea点击:

How do I delete a column that contains only zeros in Pandas?

pythonpandas

提问by user2587593

I currently have a dataframe consisting of columns with 1's and 0's as values, I would like to iterate through the columns and delete the ones that are made up of only 0's. Here's what I have tried so far:

我目前有一个由 1 和 0 作为值的列组成的数据框,我想遍历这些列并删除仅由 0 组成的列。这是我迄今为止尝试过的:

ones = []
zeros = []
for year in years:
    for i in range(0,599):
        if year[str(i)].values.any() == 1:
            ones.append(i)
        if year[str(i)].values.all() == 0:
            zeros.append(i)
    for j in ones:
        if j in zeros:
            zeros.remove(j)
    for q in zeros:
        del year[str(q)]

In which years is a list of dataframes for the various years I am analyzing, ones consists of columns with a one in them and zeros is a list of columns containing all zeros. Is there a better way to delete a column based on a condition? For some reason I have to check whether the ones columns are in the zeros list as well and remove them from the zeros list to obtain a list of all the zero columns.

其中年份是我正在分析的各个年份的数据框列表,一个由其中包含一个 1 的列组成,而 zeros 是一个包含全零的列列表。有没有更好的方法根据条件删除列?出于某种原因,我必须检查一列是否也在零列表中,并将它们从零列表中删除以获得所有零列的列表。

采纳答案by unutbu

df.loc[:, (df != 0).any(axis=0)]


Here is a break-down of how it works:

以下是其工作原理的细分:

In [74]: import pandas as pd

In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])

In [76]: df
Out[76]: 
   0  1  2  3
0  1  0  0  0
1  0  0  1  0

[2 rows x 4 columns]

df != 0creates a boolean DataFrame which is True where dfis nonzero:

df != 0创建一个布尔数据帧,它是 True ,其中df非零:

In [77]: df != 0
Out[77]: 
       0      1      2      3
0   True  False  False  False
1  False  False   True  False

[2 rows x 4 columns]

(df != 0).any(axis=0)returns a boolean Series indicating which columns have nonzero entries. (The anyoperation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)

(df != 0).any(axis=0)返回一个布尔系列,指示哪些列具有非零条目。(该any操作将沿 0 轴(即沿行)的值聚合为单个布尔值。因此,结果是每列一个布尔值。)

In [78]: (df != 0).any(axis=0)
Out[78]: 
0     True
1    False
2     True
3    False
dtype: bool

And df.loccan be used to select those columns:

并且df.loc可用于选择这些列:

In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]: 
   0  2
0  1  0
1  0  1

[2 rows x 2 columns]


To "delete" the zero-columns, reassign df:

要“删除”零列,请重新分配df

df = df.loc[:, (df != 0).any(axis=0)]

回答by Jeremy Z

Here is an alternative way to use is

这是另一种使用方法是

df.replace(0,np.nan).dropna(axis=1,how="all")

df.replace(0,np.nan).dropna(axis=1,how="all")

Compared with the solution of unutbu, this way is obviously slower:

与unutbu的解决方案相比,这种方式显然更慢:

%timeit df.loc[:, (df != 0).any(axis=0)]
652 μs ± 5.7 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答by mork

In case you'd like a more expressiveway of getting the zero-column names so you can print / log them, and drop them, in-place, by their names:

如果您想要一种更具表现力的方式来获取零列名称,以便您可以打印/记录它们,并按其名称就地放置它们:

zero_cols = [ col for col, is_zero in ((df == 0).sum() == df.shape[0]).items() if is_zero ]
df.drop(zero_cols, axis=1, inplace=True)

Some break down:

一些分解:

# a pandas Series with {col: is_zero} items
# is_zero is True when the number of zero items in that column == num_all_rows
(df == 0).sum() == df.shape[0])

# a list comprehension of zero_col_names is built from the_series
[ col for col, is_zero in the_series.items() if is_zero ]