pandas 计算熊猫数据帧行中的非空单元格并将计数添加为一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48906828/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:13:01  来源:igfitidea点击:

Count non-empty cells in pandas dataframe rows and add counts as a column

pythonpandasdataframedata-analysis

提问by Atiqul Islam

Using Python, I want to count the number of cells in a row that has datain it, in a pandas data frame and record the count in the leftmost cell of the row.

使用 Python,我想在 Pandas 数据框中计算包含数据的行中的单元格数量,并在该行的最左侧单元格中记录计数

Frame showing count column on left as requested

根据要求在左侧显示计数列的框架

回答by Keith Dowd

To count the number of cells missing data in each row, you probably want to do something like this:

要计算每行中缺少数据的单元格数量,您可能需要执行以下操作:

df.apply(lambda x: x.isnull().sum(), axis='columns')

Replace dfwith the label of your data frame.

替换df为数据框的标签。

You can create a new column and write the count to it using something like:

您可以创建一个新列并使用以下内容将计数写入其中:

df['MISSING'] = df.apply(lambda x: x.isnull().sum(), axis='columns')

The column will be created at the end (rightmost) of your data frame.

该列将在数据框的末尾(最右侧)创建。

You can move your columns around like this:

您可以像这样移动列:

df = df[['Count', 'M', 'A', 'B', 'C']]

Update

更新

I'm wondering if your missing cells are actually empty strings as opposed to NaNvalues. Can you confirm? I copied your screenshot into an Excel workbook. My full code is below:

我想知道您丢失的单元格是否实际上是空字符串而不是NaN值。你可否确认?我将您的屏幕截图复制到 Excel 工作簿中。我的完整代码如下:

df = pd.read_excel('count.xlsx', na_values=['', ' '])
df.head() # You should see NaN for empty cells
df['M']=df.apply(lambda x: x.isnull().sum(), axis='columns')
df.head() # Column M should report the values: first row: 0, second row: 1, third row: 2
df = df[['Count', 'M', 'A', 'B', 'C']]
df.head() # Column order should be Count, M, A, B, C

Notice the na_valuesparameter in the pd.read_excelmethod.

注意方法中的na_values参数pd.read_excel