pandas 过滤掉多索引数据框中具有零值的行/列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35996768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter out rows/columns with zero values in MultiIndex dataframe
提问by Dov
I have the following panda MultiIndex dataframe in python
我在 python 中有以下Pandas MultiIndex 数据框
0 1 2 3
bar one 0.000000 -0.929631 0.688818 -1.264180
two 1.130977 0.063277 0.161366 0.598538
baz one 1.420532 0.052530 -0.701400 0.678847
two -1.197097 0.314381 0.269551 1.115699
foo one -0.077463 0.437145 -0.202377 0.260864
two -0.815926 -0.508988 -1.238619 0.899013
qux one -0.347863 -0.999990 -1.428958 -1.488556
two 1.218567 -0.593987 0.099003 0.800736
My questions, how can I filter out:
我的问题,我怎样才能过滤掉:
- Columns that contains zero values -- column 0, in the above example.
With regrade to rows filtering. How can I filter rows with zeros: (bar, one) alone and how can I filter both (bar, one) and (bar, two)?
(Apologies for my not native English ;)
- 包含零值的列——上例中的第 0 列。
重新升级到行过滤。如何过滤带有零的行:单独的 (bar, one) 以及如何过滤 (bar, one) 和 (bar, two)?
(为我的母语不是英语而道歉;)
回答by Julien Spronck
To filter out columns that contain zero values, you can use
要过滤掉包含零值的列,您可以使用
df2 = df.loc[:, (df != 0).all(axis=0)]
To filter out rows that contain zero values, you can use
要过滤掉包含零值的行,您可以使用
df2 = df.loc[(df != 0).all(axis=1), :]
To filter out rows, you can use
要过滤掉行,您可以使用
df2 = df.drop('bar') ## drops both 'bar one' and 'bar two'
df2 = df.drop(('baz', 'two')) ## drops only 'baz two'
For example,
例如,
import numpy as np
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.ix['bar','one'][2] = 0
df = df.loc[:, (df != 0).all(axis=0)]
df = df.drop('bar')
df = df.drop(('baz', 'two'))
# 0 1 3
# baz one 0.686969 0.410614 0.841630
# foo one 1.522938 0.555734 -1.585507
# two -0.975976 0.522571 -0.041386
# qux one -0.991787 0.154645 0.179536
# two -0.725685 0.809784 0.394708
Another way if you have no NaN values in your dataframe is to transform your 0s into NaN and drop the columns or the rows that have NaN:
如果数据框中没有 NaN 值,另一种方法是将 0 转换为 NaN 并删除具有 NaN 的列或行:
df[df != 0.].dropna(axis=1) # to remove the columns with 0
df[df != 0.].dropna(axis=0) # to remove the rows with 0
Finally, if you want to drop the whole 'bar' row if there is one zero value, you can do this:
最后,如果你想在有一个零值的情况下删除整个 'bar' 行,你可以这样做:
indices = df.loc[(df == 0).any(axis=1), :].index.tolist() ## multi-index values that contain 0
for ind in indices:
df = df.drop(ind[0])