Python/Pandas Dataframe 用中值替换 0

Question

提问by jeangelj

I have a python pandas dataframe with several columns and one column has 0values. I want to replace the 0values with the medianor meanof this column.

我有一个包含多列的 python pandas 数据框，其中一列有0值。我想0用此列的medianor替换这些值mean。

datais my dataframe
artist_hotnessis the column

data是我的数据框
artist_hotness是列

mean_artist_hotness = data['artist_hotness'].dropna().mean()

if len(data.artist_hotness[ data.artist_hotness.isnull() ]) > 0:
data.artist_hotness.loc[ (data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness

I tried this, but it is not working.

我试过这个，但它不起作用。

Answer 1

采纳答案by jezrael

I think you can use maskand add parameter skipna=Trueto meaninstead dropna. Also need change condition to data.artist_hotness == 0if need replace 0values or data.artist_hotness.isnull()if need replace NaNvalues:

我想你可以使用mask和添加的参数skipna=True来mean代替dropna。还需要将条件更改为data.artist_hotness == 0是否需要替换0值或data.artist_hotness.isnull()是否需要替换NaN值：

import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})
print (data)
   artist_hotness
0             0.0
1             1.0
2             5.0
3             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN

Alternatively use loc, but omit column name:

或者使用loc，但省略列名：

data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN

data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)

IndexingError: (0 True 1 False 2 False 3 False Name: artist_hotness, dtype: bool, 'artist_hotness')

索引错误：（0 True 1 False 2 False 3 False 名称：artist_hotness，dtype：bool，'artist_hotness'）

Another solution is DataFrame.replacewith specifying columns:

另一种解决方案是DataFrame.replace指定列：

data=data.replace({'artist_hotness': {0: mean_artist_hotness}}) 
print (data)
    aa  artist_hotness
0  0.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

Or if need replace all 0values in all columns:

或者如果需要替换0所有列中的所有值：

import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data=data.replace(0,mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  2.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

If need replace NaNin all columns use DataFrame.fillna:

如果需要NaN在所有列中替换使用DataFrame.fillna：

data=data.fillna(mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  2.0             2.0

But if only in some columns use Series.fillna:

但如果仅在某些列中使用Series.fillna：

data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness) 
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             2.0

Answer 2

回答by shivsn

use pandasreplacemethod:

使用pandasreplace方法：

df = pd.DataFrame({'a': [1,2,3,4,0,0,0,0], 'b': [2,3,4,6,0,5,3,8]}) 

df 
   a  b
0  1  2
1  2  3
2  3  4
3  4  6
4  0  0
5  0  5
6  0  3
7  0  8

df['a']=df['a'].replace(0,df['a'].mean())

df
   a  b
0  1  2
1  2  3
2  3  4
3  4  6
4  1  0
5  1  5
6  1  3
7  1  8

Answer 3

回答by Sailendra Pinupolu

data['artist_hotness'] = data['artist_hotness'].map( lambda x : data.artist_hotness.mean() if x == 0 else x)

Answer 4

回答by sijie.xiong

Found these very useful, although maskis really slow (not sure why).

发现这些非常有用，虽然mask真的很慢（不知道为什么）。

I did this:

我这样做了：

df.loc[ df['artist_hotness'] == 0 | np.isnan(df['artist_hotness']), 'artist_hotness' ] = df['artist_hotness'].median()

Python/Pandas Dataframe 用中值替换 0

提问by jeangelj

采纳答案by jezrael

回答by shivsn

回答by Sailendra Pinupolu

回答by sijie.xiong

相关推荐

最近更新

标签

Python/Pandas Dataframe 用中值替换 0

提问by jeangelj

采纳答案by jezrael

回答by shivsn

回答by Sailendra Pinupolu

回答by sijie.xiong

相关推荐

Python ValueError：传递值的形状是 (1, 6)，索引意味着 (6, 6)

Python 在 Pandas 数据框中的不同列上使用 lambda if 条件

Python Colaboratory：如何在本地机器上安装和使用？

python：pickle.load() 引发 EOFError

相关推荐

最近更新

标签