Python 如何用 Pandas DataFrame 中的前面的值替换 NaN？

Question

提问by zegkljan

Suppose I have a DataFrame with some NaNs:

假设我有一个带有一些NaNs的 DataFrame ：

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9

What I need to do is replace every NaNwith the first non-NaNvalue in the same column above it. It is assumed that the first row will never contain a NaN. So for the previous example the result would be

我需要做的是用它上方同一列中NaN的第一个非NaN值替换each 。假设第一行永远不会包含 a NaN。所以对于前面的例子，结果是

I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?

我可以逐列、逐个元素地遍历整个 DataFrame 并直接设置值，但是有没有一种简单的（最好是无循环的）方法来实现这一点？

Answer 1

采纳答案by Alex Riley

You could use the fillnamethod on the DataFrame and specify the method as ffill(forward fill):

您可以fillna在 DataFrame 上使用该方法并将该方法指定为ffill（向前填充）：

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

This method...

这个方法...

propagate[s] last valid observation forward to next valid

传播 [s] 上一个有效观察向前到下一个有效观察

To go the opposite way, there's also a bfillmethod.

反其道而行之，还有一个bfill方法。

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

此方法不会就地修改 DataFrame - 您需要将返回的 DataFrame 重新绑定到变量或指定inplace=True：

df.fillna(method='ffill', inplace=True)

Answer 2

回答by Ffisegydd

You can use pandas.DataFrame.fillnawith the method='ffill'option. 'ffill'stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill'which works the same way, but backwards.

您可以使用pandas.DataFrame.fillna该method='ffill'选项。'ffill'代表“向前填充”并将向前传播最后一个有效观察。另一种方法是以'bfill'相同的方式工作，但倒退。

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')

print(df)
#   0  1  2
#0  1  2  3
#1  4  2  3
#2  4  2  9

There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

还有一个直接的同义词函数pandas.DataFrame.ffill，为了让事情更简单。

Answer 3

回答by jjs

One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

在尝试此解决方案时，我注意到的一件事是，如果在数组的开头或结尾有 N/A，则 ffill 和 bfill 不太适用。你需要两者。

In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])

In [225]: df.ffill()
Out[225]:
     0
0  NaN
1  1.0
...
7  6.0
8  6.0

In [226]: df.bfill()
Out[226]:
     0
0  1.0
1  1.0
...
7  6.0
8  NaN

In [227]: df.bfill().ffill()
Out[227]:
     0
0  1.0
1  1.0
...
7  6.0
8  6.0

Answer 4

回答by piRSquared

ffillnow has it's own method pd.DataFrame.ffill

ffill现在有它自己的方法 pd.DataFrame.ffill

df.ffill()

     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0

Answer 5

回答by ErnestScribbler

The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

接受的答案是完美的。我有一个相关但略有不同的情况，我必须向前填写，但只能在小组内填写。如果有人有同样的需求，请知道 fillna 可用于 DataFrameGroupBy 对象。

>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
  name  number
0    a     0.0
1    a     1.0
2    a     2.0
3    b     NaN
4    b     4.0
5    b     NaN
6    c     6.0
7    c     7.0
8    c     8.0
9    c     9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    4.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: number, dtype: float64

Answer 6

回答by Anton Shelin

In my case, we have time series from different devices but some devices could not send any value during some period. So we should create NA values for every device and time period and after that do fillna.

就我而言，我们有来自不同设备的时间序列，但某些设备在某个时间段内无法发送任何值。所以我们应该为每个设备和时间段创建 NA 值，然后再填充。

df = pd.DataFrame([["device1", 1, 'first val of device1'], ["device2", 2, 'first val of device2'], ["device3", 3, 'first val of device3']])
df.pivot(index=1, columns=0, values=2).fillna(method='ffill').unstack().reset_index(name='value')

Result:

结果：

        0   1   value
0   device1     1   first val of device1
1   device1     2   first val of device1
2   device1     3   first val of device1
3   device2     1   None
4   device2     2   first val of device2
5   device2     3   first val of device2
6   device3     1   None
7   device3     2   None
8   device3     3   first val of device3

Answer 7

回答by DeveScie

Only one column version

只有一列版本

Fill NAN with last valid value

用最后一个有效值填充 NAN

df[column_name].fillna(method='ffill', inplace=True)

Fill NAN with next valid value

用下一个有效值填充 NAN

df[column_name].fillna(method='backfill', inplace=True)

Answer 8

回答by Suvo

Just agreeing with ffillmethod, but one extra info is that you can limit the forward fill with keyword argument limit.

只是同意ffill方法，但一个额外的信息是您可以使用关键字参数限制前向填充limit。

>>> import pandas as pd    
>>> df = pd.DataFrame([[1, 2, 3], [None, None, 6], [None, None, 9]])

>>> df
     0    1   2
0  1.0  2.0   3
1  NaN  NaN   6
2  NaN  NaN   9

>>> df[1].fillna(method='ffill', inplace=True)
>>> df
     0    1    2
0  1.0  2.0    3
1  NaN  2.0    6
2  NaN  2.0    9

Now with limitkeyword argument

现在使用limit关键字参数

>>> df[0].fillna(method='ffill', limit=1, inplace=True)

>>> df
     0    1  2
0  1.0  2.0  3
1  1.0  2.0  6
2  NaN  2.0  9

Answer 9

回答by Md Jewele Islam

You can use fillnato remove or replace NaN values.

您可以使用fillna删除或替换 NaN 值。

NaN Remove

NaN移除

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])

df.fillna(method='ffill')
     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0

NaN Replace

NaN替换

df.fillna(0) # 0 means What Value you want to replace 
     0    1    2
0  1.0  2.0  3.0
1  4.0  0.0  0.0
2  0.0  0.0  9.0

Reference pandas.DataFrame.fillna

参考pandas.DataFrame.fillna

Python 如何用 Pandas DataFrame 中的前面的值替换 NaN？

提问by zegkljan

采纳答案by Alex Riley

回答by Ffisegydd

回答by jjs

回答by piRSquared

回答by ErnestScribbler

回答by Anton Shelin

回答by DeveScie

回答by Suvo

回答by Md Jewele Islam

相关推荐

最近更新

标签

Python 如何用 Pandas DataFrame 中的前面的值替换 NaN？

提问by zegkljan

采纳答案by Alex Riley

回答by Ffisegydd

回答by jjs

回答by piRSquared

回答by ErnestScribbler

回答by Anton Shelin

回答by DeveScie

回答by Suvo

回答by Md Jewele Islam

相关推荐

Python SQLAlchemy 十进制精度

小于 400 万的偶数斐波那契数的总和 - Python

Python pip install: 请检查该目录的权限和所有者

Python 中的实时中断

相关推荐

最近更新

标签