Pandas DataFrame - 用空白替换 NULL 字符串,用 0 替换 NULL 数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52873804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0
提问by HMan06
I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.
我正在处理一个包含许多不同类型列的大型数据集。有一些数值和带有一些 NULL 值的字符串的混合。我需要根据类型将 NULL 值更改为空白或 0。
1 John 2 Doe 3 Mike 4 Orange 5 Stuff
9 NULL NULL NULL 8 NULL NULL Lemon 12 NULL
I want it to look like this,
我希望它看起来像这样
1 John 2 Doe 3 Mike 4 Orange 5 Stuff
9 0 8 0 Lemon 12
I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.
我可以为每个人执行此操作,但由于我将提取具有数百列的多个超大数据集,因此我想以其他方式执行此操作。
Edit: Types from Smaller Dataset,
编辑:来自较小数据集的类型,
Field1 object
Field2 object
Field3 object
Field4 object
Field5 object
Field6 object
Field7 object
Field8 object
Field9 object
Field10 float64
Field11 float64
Field12 float64
Field13 float64
Field14 float64
Field15 object
Field16 float64
Field17 object
Field18 object
Field19 float64
Field20 float64
Field21 int64
回答by jezrael
Use DataFrame.select_dtypes
for numeric columns, filter by subset and replace values to 0
, then repalce all another columns to empty string:
使用DataFrame.select_dtypes
数字列,过滤通过子集和替换值0
,然后repalce所有另一列空字符串:
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 NaN NaN NaN 8 NaN NaN Lemon 12 NaN
print (df.dtypes)
0 int64
1 object
2 float64
3 object
4 int64
5 object
6 float64
7 object
8 int64
9 object
dtype: object
c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
Another solution is create dictionary for replace:
另一种解决方案是创建用于替换的字典:
num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")
d = {**d1, **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}
df = df.fillna(d)
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
回答by Andrea
You could try this to substitute a different value for each different column (A
to C
are numeric, while D
is a string):
您可以尝试使用此方法为每个不同的列替换不同的值(A
toC
是数字,whileD
是字符串):
import pandas as pd
import numpy as np
df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
[3, 4, np.nan, '1'],
[np.nan, np.nan, np.nan, '5'],
[np.nan, 3, np.nan, np.nan]],
columns=list('ABCD'))
df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})