pandas 在python中屏蔽pandas数据帧上的多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24328650/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Masking multiple columns on a pandas dataframe in python
提问by Antihead
i am looking to apply multiply maskson each column of a pandas dataset(respectively to it's properties) in python. In the next step i want to find (a) row(s)in the dataframe that fits all conditions. therefore i have:
我希望在 python中的 Pandas 数据集的每一列上应用乘法掩码(分别对应于它的属性)。在下一步中,我想在适合所有条件的数据框中找到 (a) 行。因此我有:
df
Out[27]:
DE FL GA IA ID
0 0 1 0 0 0
1 1 0 1 0 1
2 0 0 1 0 0
3 0 1 0 0 0
4 0 0 0 0 0
mask_list = []
for i in range(0,5):
if i % 2==0:
mask_list.append(df[[i]]>0)
else:
mask_list.append(df[[i]]<1)
concat_frame = pa.DataFrame()
for mask in mask_list:
concat_frame =pa.concat((concat_frame, mask), axis=1)
concat_frame
Out[48]:
DE FL GA IA ID
0 False False False True False
1 True True True True True
2 False True True True False
3 False False False True False
4 False True False True False
[5 rows x 5 columns]
updateexpected outcome:
更新预期结果:
outcome
Out[60]:
DE FL GA IA ID
1 1 0 1 0 1
Here comes the question:
how can i apply the concat_maskon df, so that i select rows, in which all Boolean criteria are matched(are True)?
问题来了:
如何在df上应用concat_mask,以便我选择所有布尔条件都匹配(为真)的行?
回答by mgilbert
You can use the pandas all method and boolean logic. As EdChum commented I am a bit unclear still on your exact example but a similar example is
您可以使用 pandas all 方法和布尔逻辑。正如 EdChum 评论的那样,我对您的确切示例仍然有些不清楚,但类似的示例是
In [1]: df = DataFrame([[1,2],[-3,5]], index=[0,1], columns=['a','b'])
In [2]: df
Out [2]:
a b
0 1 2
1 -3 5
In [3]: msk = (df>1) & (df<5)
In [4]: msk
Out [4]:
a b
0 False True
1 False False
In [5]: msk.all(axis=1)
Out [5]:
0 False
1 False
dtype: bool
If you wanted to index the original dataframe by the mask you could do
如果你想通过掩码索引原始数据帧,你可以这样做
In [6]: df[msk]
Out [6]:
a b
0 NaN 2
1 NaN NaN
Or as you originally indicated rows where all the rows are true
或者如您最初指出的所有行都为真的行
In [7]: idx = msk.all(axis=1)
In [8]: df[idx]
Out [8]:
Empty DataFrame
Columns: [a,b]
Index: []
Or if one row was true
或者如果一行是真的
In [9]: idx[0] = True
In [10]: df[idx]
Out [10]:
a b
0 1 2
Edit: Just to address the original question after clarification from the comments, where we want different filtering criteria for different columns
编辑:只是为了解决评论澄清后的原始问题,我们希望不同的列有不同的过滤条件
In [10]: msk1 = df[['a']] < 0
In [11]: msk2 = df[['b']] > 3
In [12]: msk = concat((msk1, msk2), axis=1)
In [12]: slct = msk.all(axis=1)
In [13]: df.ix[slct]
Out [13]:
a b
1 -3 5
回答by ely
df[df[['DE', 'GA', 'ID']].all(axis=1) * (1 - df[['FL', 'IA']]).all(axis=1)]
The hard part here is understanding why you're using even/odd column positions to determine the treatment. Based on your code, it looks like you want columns 0, 2, and 4 to actually be 1 minus their current values. However, based on what you claim is the expected output, it actually seems like you want colums 1 and 3 to have 1 minus their current values.
这里的难点是理解为什么要使用偶数/奇数列位置来确定处理方式。根据您的代码,您似乎希望第 0、2 和 4 列实际上是 1 减去它们的当前值。但是,根据您声称的预期输出,实际上您似乎希望第 1 列和第 3 列的值为 1 减去它们的当前值。
My code above reflects the latter assumption. The general idea still works; just tune it to reflect whatever columns you actually need to have 1 minus the value of, assuming you make your desired output more rigorously defined.
我上面的代码反映了后一种假设。总体思路仍然有效;只需调整它以反映您实际需要 1 减去其值的任何列,假设您更严格地定义了所需的输出。
Probably that needs to be cleaned up and turned into a proper helper function first that explicitly shows which columns need to have 1 minus their value, versus which columns can be left alone.
可能需要先清理并转换为适当的辅助函数,以明确显示哪些列需要 1 减去它们的值,而哪些列可以单独保留。

