Python 如何用 Pandas DataFrame 中的前面的值替换 NaN?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27905295/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace NaNs by preceding values in pandas DataFrame?
提问by zegkljan
Suppose I have a DataFrame with some NaN
s:
假设我有一个带有一些NaN
s的 DataFrame :
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df
0 1 2
0 1 2 3
1 4 NaN NaN
2 NaN NaN 9
What I need to do is replace every NaN
with the first non-NaN
value in the same column above it. It is assumed that the first row will never contain a NaN
. So for the previous example the result would be
我需要做的是用它上方同一列中NaN
的第一个非NaN
值替换each 。假设第一行永远不会包含 a NaN
。所以对于前面的例子,结果是
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?
我可以逐列、逐个元素地遍历整个 DataFrame 并直接设置值,但是有没有一种简单的(最好是无循环的)方法来实现这一点?
采纳答案by Alex Riley
You could use the fillna
method on the DataFrame and specify the method as ffill
(forward fill):
您可以fillna
在 DataFrame 上使用该方法并将该方法指定为ffill
(向前填充):
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
This method...
这个方法...
propagate[s] last valid observation forward to next valid
传播 [s] 上一个有效观察向前到下一个有效观察
To go the opposite way, there's also a bfill
method.
反其道而行之,还有一个bfill
方法。
This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True
:
此方法不会就地修改 DataFrame - 您需要将返回的 DataFrame 重新绑定到变量或指定inplace=True
:
df.fillna(method='ffill', inplace=True)
回答by Ffisegydd
You can use pandas.DataFrame.fillna
with the method='ffill'
option. 'ffill'
stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill'
which works the same way, but backwards.
您可以使用pandas.DataFrame.fillna
该method='ffill'
选项。'ffill'
代表“向前填充”并将向前传播最后一个有效观察。另一种方法是以'bfill'
相同的方式工作,但倒退。
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')
print(df)
# 0 1 2
#0 1 2 3
#1 4 2 3
#2 4 2 9
There is also a direct synonym function for this, pandas.DataFrame.ffill
, to make things simpler.
还有一个直接的同义词函数pandas.DataFrame.ffill
,为了让事情更简单。
回答by jjs
One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.
在尝试此解决方案时,我注意到的一件事是,如果在数组的开头或结尾有 N/A,则 ffill 和 bfill 不太适用。你需要两者。
In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])
In [225]: df.ffill()
Out[225]:
0
0 NaN
1 1.0
...
7 6.0
8 6.0
In [226]: df.bfill()
Out[226]:
0
0 1.0
1 1.0
...
7 6.0
8 NaN
In [227]: df.bfill().ffill()
Out[227]:
0
0 1.0
1 1.0
...
7 6.0
8 6.0
回答by piRSquared
ffill
now has it's own method pd.DataFrame.ffill
ffill
现在有它自己的方法 pd.DataFrame.ffill
df.ffill()
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 3.0
2 4.0 2.0 9.0
回答by ErnestScribbler
The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.
接受的答案是完美的。我有一个相关但略有不同的情况,我必须向前填写,但只能在小组内填写。如果有人有同样的需求,请知道 fillna 可用于 DataFrameGroupBy 对象。
>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
name number
0 a 0.0
1 a 1.0
2 a 2.0
3 b NaN
4 b 4.0
5 b NaN
6 c 6.0
7 c 7.0
8 c 8.0
9 c 9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
5 4.0
6 6.0
7 7.0
8 8.0
9 9.0
Name: number, dtype: float64
回答by Anton Shelin
In my case, we have time series from different devices but some devices could not send any value during some period. So we should create NA values for every device and time period and after that do fillna.
就我而言,我们有来自不同设备的时间序列,但某些设备在某个时间段内无法发送任何值。所以我们应该为每个设备和时间段创建 NA 值,然后再填充。
df = pd.DataFrame([["device1", 1, 'first val of device1'], ["device2", 2, 'first val of device2'], ["device3", 3, 'first val of device3']])
df.pivot(index=1, columns=0, values=2).fillna(method='ffill').unstack().reset_index(name='value')
Result:
结果:
0 1 value
0 device1 1 first val of device1
1 device1 2 first val of device1
2 device1 3 first val of device1
3 device2 1 None
4 device2 2 first val of device2
5 device2 3 first val of device2
6 device3 1 None
7 device3 2 None
8 device3 3 first val of device3
回答by DeveScie
Only one column version
只有一列版本
- Fill NAN with last valid value
- 用最后一个有效值填充 NAN
df[column_name].fillna(method='ffill', inplace=True)
- Fill NAN with next valid value
- 用下一个有效值填充 NAN
df[column_name].fillna(method='backfill', inplace=True)
回答by Suvo
Just agreeing with ffill
method, but one extra info is that you can limit the forward fill with keyword argument limit
.
只是同意ffill
方法,但一个额外的信息是您可以使用关键字参数限制前向填充limit
。
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [None, None, 6], [None, None, 9]])
>>> df
0 1 2
0 1.0 2.0 3
1 NaN NaN 6
2 NaN NaN 9
>>> df[1].fillna(method='ffill', inplace=True)
>>> df
0 1 2
0 1.0 2.0 3
1 NaN 2.0 6
2 NaN 2.0 9
Now with limit
keyword argument
现在使用limit
关键字参数
>>> df[0].fillna(method='ffill', limit=1, inplace=True)
>>> df
0 1 2
0 1.0 2.0 3
1 1.0 2.0 6
2 NaN 2.0 9
回答by Md Jewele Islam
You can use fillna
to remove or replace NaN values.
您可以使用fillna
删除或替换 NaN 值。
NaN Remove
NaN移除
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 3.0
2 4.0 2.0 9.0
NaN Replace
NaN替换
df.fillna(0) # 0 means What Value you want to replace
0 1 2
0 1.0 2.0 3.0
1 4.0 0.0 0.0
2 0.0 0.0 9.0
Reference pandas.DataFrame.fillna