pandas 过滤掉多索引数据框中具有零值的行/列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35996768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:52:13  来源:igfitidea点击:

Filter out rows/columns with zero values in MultiIndex dataframe

pythonpandas

提问by Dov

I have the following panda MultiIndex dataframe in python

我在 python 中有以下Pandas MultiIndex 数据框

             0         1         2         3 
bar one  0.000000 -0.929631  0.688818 -1.264180
    two  1.130977  0.063277  0.161366  0.598538
baz one  1.420532  0.052530 -0.701400  0.678847
    two -1.197097  0.314381  0.269551  1.115699
foo one -0.077463  0.437145 -0.202377  0.260864
    two -0.815926 -0.508988 -1.238619  0.899013
qux one -0.347863 -0.999990 -1.428958 -1.488556
    two  1.218567 -0.593987  0.099003  0.800736

My questions, how can I filter out:

我的问题,我怎样才能过滤掉:

  1. Columns that contains zero values -- column 0, in the above example.
  2. With regrade to rows filtering. How can I filter rows with zeros: (bar, one) alone and how can I filter both (bar, one) and (bar, two)?

    (Apologies for my not native English ;)

  1. 包含零值的列——上例中的第 0 列。
  2. 重新升级到行过滤。如何过滤带有零的行:单独的 (bar, one) 以及如何过滤 (bar, one) 和 (bar, two)?

    (为我的母语不是英语而道歉;)

回答by Julien Spronck

To filter out columns that contain zero values, you can use

要过滤掉包含零值的列,您可以使用

df2 = df.loc[:, (df != 0).all(axis=0)]

To filter out rows that contain zero values, you can use

要过滤掉包含零值的行,您可以使用

df2 = df.loc[(df != 0).all(axis=1), :]

To filter out rows, you can use

要过滤掉行,您可以使用

df2 = df.drop('bar') ## drops both 'bar one' and 'bar two'
df2 = df.drop(('baz', 'two')) ## drops only 'baz two'

For example,

例如,

import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.ix['bar','one'][2] = 0
df = df.loc[:, (df != 0).all(axis=0)]
df = df.drop('bar')
df = df.drop(('baz', 'two'))

#                 0         1         3
# baz one  0.686969  0.410614  0.841630
# foo one  1.522938  0.555734 -1.585507
#     two -0.975976  0.522571 -0.041386
# qux one -0.991787  0.154645  0.179536
#     two -0.725685  0.809784  0.394708

Another way if you have no NaN values in your dataframe is to transform your 0s into NaN and drop the columns or the rows that have NaN:

如果数据框中没有 NaN 值,另一种方法是将 0 转换为 NaN 并删除具有 NaN 的列或行:

df[df != 0.].dropna(axis=1) # to remove the columns with 0
df[df != 0.].dropna(axis=0) # to remove the rows with 0

Finally, if you want to drop the whole 'bar' row if there is one zero value, you can do this:

最后,如果你想在有一个零值的情况下删除整个 'bar' 行,你可以这样做:

indices = df.loc[(df == 0).any(axis=1), :].index.tolist() ## multi-index values that contain 0
for ind in indices:
    df = df.drop(ind[0])