pandas python pandas跨列条件计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29566603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas conditional count across columns
提问by MJS
I have a dataframe (called panel[xyz]) containing only 1, 0 and -1. The dimensions are: rows 0:10 and columns a:j.
我有一个仅包含 1、0 和 -1 的数据框(称为面板 [xyz])。维度是:行 0:10 和列 a:j。
I would like to create another dataframe (df) which has the same vertical axis, but only 3 columns: col_1 = count all non-zero values (1s and -1s) col_2 = count all 1s col_3 = count all -1s
我想创建另一个具有相同垂直轴但只有 3 列的数据框 (df): col_1 = 计算所有非零值(1s 和 -1s) col_2 = 计算所有 1s col_3 = 计算所有 -1s
I found this in searching SO:
我在搜索中发现了这个:
df[col_1] = (pan[xyz]['a','b','c','d','e'] > 0).count(axis=1)
...and have tried many different iterations, but I cannot get the conditional (>0) to distinguish between the different values in pan[xyz]. The count is always = 5.
...并尝试了许多不同的迭代,但我无法获得条件 (>0) 来区分 pan[xyz] 中的不同值。计数总是 = 5。
Any help would be much appreciated.
任何帮助将非常感激。
Edit:
编辑:
pan[xyz] =
泛[xyz] =
. 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'
0 1 0 0 -1 0 0 -1 0 1 0
1 0 1 0 0 0 1 0 0 0 -1
2 1 0 0 0 0 -1 0 0 0 0
3 0 -1 0 0 0 0 0 1 0 0
4 0 0 0 1 0 0 -1 0 0 -1
df should be =
df 应该是 =
. col_1 col_2 col_3
0 4 2 2
1 3 2 1
2 2 1 1
3 2 1 1
4 3 1 2
But this is what i get for col_1 :
但这就是我为 col_1 得到的:
df = (panel[xyz] > 0).count(axis=1)
df
Out[129]:
0 10
1 10
2 10
3 10
4 10
dtype: int6
回答by JohnE
I'm just doing this with a flat dataframe but it's the same for panel. You can do one of two ways. The first way is what you did, just change the count()to sum():
我只是用一个平面数据框来做这个,但面板也是如此。您可以采用以下两种方式之一。第一种方法是您所做的,只需将其更改count()为sum():
( df > 0 ).sum(axis=1)
The underlying structure is boolean and True and False both get counted, whereas if you sum them it is interpreted more like you were expecting (0/1).
底层结构是布尔值,True 和 False 都被计算在内,而如果对它们求和,它的解释更像您期望的 (0/1)。
But a more standard way to do it would be like this:
但更标准的方法是这样的:
df[ df > 0 ].count(axis=1)
While the former method was based on a dataframe of booleans, the latter looks like this:
虽然前一种方法基于布尔值数据帧,但后者看起来像这样:
df[ df > 0 ]
a b c d e f g h i j
0 1 NaN NaN NaN NaN NaN NaN NaN 1 NaN
1 NaN 1 NaN NaN NaN 1 NaN NaN NaN NaN
2 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN
4 NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN
In this case it doesn't really matter which method you use, but in general the latter is going to be better, because you can do more with it. For example, with the former method (which has binary outcomes by design), all you can really do is count, but in the latter method you can count, sum, multiply, etc.
在这种情况下,您使用哪种方法并不重要,但通常后者会更好,因为您可以用它做更多事情。例如,使用前一种方法(按设计具有二元结果),您真正能做的就是计数,但在后一种方法中,您可以进行计数、求和、乘法等。
The potential usefulness of this may be more obvious for the case of df != 0, where there are more than two possible values:
对于 的情况,它的潜在用处可能更明显df != 0,其中有两个以上的可能值:
df[ df != 0 ]
a b c d e f g h i j
0 1 NaN NaN -1 NaN NaN -1 NaN 1 NaN
1 NaN 1 NaN NaN NaN 1 NaN NaN NaN -1
2 1 NaN NaN NaN NaN -1 NaN NaN NaN NaN
3 NaN -1 NaN NaN NaN NaN NaN 1 NaN NaN
4 NaN NaN NaN 1 NaN NaN -1 NaN NaN -1

