Python/Pandas Dataframe 用中值替换 0
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37506488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python/Pandas Dataframe replace 0 with median value
提问by jeangelj
I have a python pandas dataframe with several columns and one column has 0
values. I want to replace the 0
values with the median
or mean
of this column.
我有一个包含多列的 python pandas 数据框,其中一列有0
值。我想0
用此列的median
or替换这些值mean
。
data
is my dataframeartist_hotness
is the column
data
是我的数据框artist_hotness
是列
mean_artist_hotness = data['artist_hotness'].dropna().mean()
if len(data.artist_hotness[ data.artist_hotness.isnull() ]) > 0:
data.artist_hotness.loc[ (data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness
I tried this, but it is not working.
我试过这个,但它不起作用。
采纳答案by jezrael
I think you can use mask
and add parameter skipna=True
to mean
instead dropna
. Also need change condition to data.artist_hotness == 0
if need replace 0
values or data.artist_hotness.isnull()
if need replace NaN
values:
我想你可以使用mask
和添加的参数skipna=True
来mean
代替dropna
。还需要将条件更改为data.artist_hotness == 0
是否需要替换0
值或data.artist_hotness.isnull()
是否需要替换NaN
值:
import pandas as pd
import numpy as np
data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})
print (data)
artist_hotness
0 0.0
1 1.0
2 5.0
3 NaN
mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0
data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)
print (data)
artist_hotness
0 2.0
1 1.0
2 5.0
3 NaN
Alternatively use loc
, but omit column name:
或者使用loc
,但省略列名:
data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
artist_hotness
0 2.0
1 1.0
2 5.0
3 NaN
data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
IndexingError: (0 True 1 False 2 False 3 False Name: artist_hotness, dtype: bool, 'artist_hotness')
索引错误:(0 True 1 False 2 False 3 False 名称:artist_hotness,dtype:bool,'artist_hotness')
Another solution is DataFrame.replace
with specifying columns:
另一种解决方案是DataFrame.replace
指定列:
data=data.replace({'artist_hotness': {0: mean_artist_hotness}})
print (data)
aa artist_hotness
0 0.0 2.0
1 1.0 1.0
2 5.0 5.0
3 NaN NaN
Or if need replace all 0
values in all columns:
或者如果需要替换0
所有列中的所有值:
import pandas as pd
import numpy as np
data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})
print (data)
aa artist_hotness
0 0.0 0.0
1 1.0 1.0
2 5.0 5.0
3 NaN NaN
mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0
data=data.replace(0,mean_artist_hotness)
print (data)
aa artist_hotness
0 2.0 2.0
1 1.0 1.0
2 5.0 5.0
3 NaN NaN
If need replace NaN
in all columns use DataFrame.fillna
:
如果需要NaN
在所有列中替换使用DataFrame.fillna
:
data=data.fillna(mean_artist_hotness)
print (data)
aa artist_hotness
0 0.0 0.0
1 1.0 1.0
2 5.0 5.0
3 2.0 2.0
But if only in some columns use Series.fillna
:
但如果仅在某些列中使用Series.fillna
:
data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness)
print (data)
aa artist_hotness
0 0.0 0.0
1 1.0 1.0
2 5.0 5.0
3 NaN 2.0
回答by shivsn
use pandas
replace
method:
使用pandas
replace
方法:
df = pd.DataFrame({'a': [1,2,3,4,0,0,0,0], 'b': [2,3,4,6,0,5,3,8]})
df
a b
0 1 2
1 2 3
2 3 4
3 4 6
4 0 0
5 0 5
6 0 3
7 0 8
df['a']=df['a'].replace(0,df['a'].mean())
df
a b
0 1 2
1 2 3
2 3 4
3 4 6
4 1 0
5 1 5
6 1 3
7 1 8
回答by Sailendra Pinupolu
data['artist_hotness'] = data['artist_hotness'].map( lambda x : data.artist_hotness.mean() if x == 0 else x)