pandas 计算pandas中每行具有某些值的列数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44717137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count number of columns with some values for each row in pandas
提问by jovicbg
I have dataframe like this, data:
我有这样的数据框,数据:
Site code Col1 Col2 Col3
A5252 24 53 NaN
A5636 36 NaN NaN
A4366 NaN NaN NaN
A7578 42 785 24
And I want to count a number of columns with some value, but none NaN. Desired output:
我想计算一些具有某些值的列,但没有 NaN。期望的输出:
Site code Col1 Col2 Col3 Count
A5252 24 53 NaN 2
A5636 36 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42 785 24 3
Something oposite to this: df = data.isnull().sum(axis=1)
与此相反的东西:df = data.isnull().sum(axis=1)
回答by jezrael
#if first columns is not index, set it
data = data.set_index('Site code')
data['Count'] = data.notnull().sum(axis=1)
Or use function DataFrame.count
:
或使用功能DataFrame.count
:
data = data.set_index('Site code')
data['Count'] = data.count(axis=1)
print (data)
Col1 Col2 Col3 Count
Site code
A5252 24.0 53.0 NaN 2
A5636 36.0 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42.0 785.0 24.0 3
Another solution with selecting columns by loc
(Site code
is column, not index
):
通过loc
(Site code
是 column, not index
)选择列的另一种解决方案:
print (data.loc[:, 'Col1':])
Col1 Col2 Col3
0 24.0 53.0 NaN
1 36.0 NaN NaN
2 NaN NaN NaN
3 42.0 785.0 24.0
data['Count'] = data.loc[:, 'Col1':].count(axis=1)
print (data)
Site code Col1 Col2 Col3 Count
0 A5252 24.0 53.0 NaN 2
1 A5636 36.0 NaN NaN 1
2 A4366 NaN NaN NaN 0
3 A7578 42.0 785.0 24.0 3
Another nice idea from Jon Clements- use filter
:
Jon Clements 的另一个好主意- 使用filter
:
data['Count'] = data.filter(regex="^Col").count(axis=1)
print (data)
Site code Col1 Col2 Col3 Count
0 A5252 24.0 53.0 NaN 2
1 A5636 36.0 NaN NaN 1
2 A4366 NaN NaN NaN 0
3 A7578 42.0 785.0 24.0 3
回答by void
Simple use notnull()
使用简单 notnull()
import pandas as pd
df = pd.read_csv("your_csv.csv")
df['count'] = df.notnull().sum(axis=1)
print(df)
Also to add a column to a dataframe just use:
还要向数据框添加一列,只需使用:
df['new_column_name'] = newcolumn
output:
输出:
Site code Col1 Col 2 Col3 count
A5252 24 53 NaN 2
A5636 36 NaN NaN 1
A4366 NaN NaN NaN 0
A7578 42 785 24 3