Python pandas 仅填充一行具有特定值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35178117/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:36:45  来源:igfitidea点击:

Python pandas fillna only one row with specific value

pythonpandasnanfill

提问by ragesz

EDITED:

编辑:

I have (not a very simple) a dataframe:

我有(不是很简单)一个数据框:

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

     att1
0  1.0000
1  2.0000
2     nan
3     nan
4     nan
5     nan
6  3.0000
7  4.0000
8     nan
9     nan
10    nan
11 5.0000

I want fill NANvalues with the previous not NANvalue except the last NANvalue. I want the last NANvalue to be NANafter filling. How can I do that?

我想NANNAN除最后一个NAN值之外的前一个非值填充值。我希望填充后的最后一个NANNAN。我怎样才能做到这一点?

I want this result:

我想要这个结果:

     att1
0  1.0000
1  2.0000
2  2.0000
3  2.0000
4  2.0000
5     nan
6  3.0000
7  4.0000
8  4.0000
9  4.0000
10    nan
11 5.0000

I tried this:

我试过这个:

df = df.fillna(value='missing', method='bfill', limit=1)
df = df.fillna(method='ffill')

But the first row gives this error:

但是第一行给出了这个错误:

ValueError: cannot specify both a fill method and value

Why there is this limitation in pandas 0.17.1 / Python 3.5? Thank you!

为什么pandas 0.17.1 / Python 3.5 有这个限制?谢谢!

采纳答案by jezrael

You can count NaNin df['att1'], substract 1and then it use as parameter limitsto fillna:

你可以指望NaNdf['att1'],。减去1,然后将其作为参数使用limitsfillna

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])
print df
   att1
0     1
1     2
2   NaN
3   NaN
4   NaN
5   NaN
6     3

s = df['att1'].isnull().sum() - 1
df['att1'] = df['att1'].fillna('missing', limit=s)
print df
      att1
0        1
1        2
2  missing
3  missing
4  missing
5      NaN
6        3

EDIT:

编辑:

Now it is more complicated.

现在情况更复杂了。

So first set helper column countfor counting consecutives values of column att1by isnull, shift, astypeand cumsum. Then groupbyby this column countand fillna:

因此,第一套辅助柱count,用于计算列的consecutives值att1isnullshiftastypecumsum。然后groupby通过此列countfillna

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])
print df

df['count'] = (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
print df
    att1  count
0      1      1
1      2      1
2    NaN      2
3    NaN      2
4    NaN      2
5    NaN      2
6      3      3
7      4      3
8    NaN      4
9    NaN      4
10   NaN      4
11     5      5
def f(x):
    att = x['att1'].isnull()
    if(att.all()):
        return x['att1'].fillna('missing', limit=att.sum() - 1)
    else:
        return x['att1']

print df.groupby(['count']).apply(f).reset_index(drop=True)

0           1
1           2
2     missing
3     missing
4     missing
5         NaN
6           3
7           4
8     missing
9     missing
10        NaN
11          5
Name: att1, dtype: object

Explaining column count:

说明栏count

print (df['att1'].isnull() != df['att1'].isnull().shift())
0      True
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8      True
9     False
10    False
11     True
Name: att1, dtype: bool
print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int)
0     1
1     0
2     1
3     0
4     0
5     0
6     1
7     0
8     1
9     0
10    0
11    1
Name: att1, dtype: int32
print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
0     1
1     1
2     2
3     2
4     2
5     2
6     3
7     3
8     4
9     4
10    4
11    5
Name: att1, dtype: int32

回答by Woody Pride

An alternative method that is maybe a little less complex would just to be to create a list of index points at which you expect there to be NaNs (where the index point is not null, but the index point before is null). Then you just forward fill your data and reinsert the NaNs using the list you created.

另一种可能不太复杂的替代方法就是创建一个索引点列表,您希望在该列表处有 NaN(其中索引点不为空,但之前的索引点为空)。然后,您只需向前填充数据并使用您创建的列表重新插入 NaN。

import pandas as pd
import numpy as np
from numpy import nan as NA
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

#create list of index points where you want NaNs to be be
Nan_ind = [x - 1 for x in xrange(1, df.index[-1] + 1) 
                if pd.notnull(df.loc[x, 'att1'])
                and pd.isnull(df.loc[x-1, 'att1'])]

#forward fillna             
df['att1'] = df['att1'].fillna(method = 'ffill')

#reinsert NaNs using your list of index points
df.loc[Nan_ind, 'att1'] = NA

回答by Arthur Zennig

fillna all NaN with "missing". The last "missing" you can replace with NaN.

用“缺失”填充所有 NaN。您可以用 NaN 替换最后一个“缺失”。

df['att1'].fillna("missing",inplace=True)
df.iloc[[-2]].replace("missing",NaN)

using negative value for iloc search index backwards. -2 return the value of the forelast element of the 'att1' column.

向后使用 iloc 搜索索引的负值。-2 返回 'att1' 列的前一个元素的值。