Python pandas 仅填充一行具有特定值

Question

提问by ragesz

EDITED:

编辑：

I have (not a very simple) a dataframe:

我有（不是很简单）一个数据框：

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

     att1
0  1.0000
1  2.0000
2     nan
3     nan
4     nan
5     nan
6  3.0000
7  4.0000
8     nan
9     nan
10    nan
11 5.0000

I want fill NANvalues with the previous not NANvalue except the last NANvalue. I want the last NANvalue to be NANafter filling. How can I do that?

我想NAN用NAN除最后一个NAN值之外的前一个非值填充值。我希望填充后的最后一个NAN值NAN。我怎样才能做到这一点？

I want this result:

我想要这个结果：

I tried this:

我试过这个：

df = df.fillna(value='missing', method='bfill', limit=1)
df = df.fillna(method='ffill')

But the first row gives this error:

但是第一行给出了这个错误：

ValueError: cannot specify both a fill method and value

Why there is this limitation in pandas 0.17.1 / Python 3.5? Thank you!

为什么pandas 0.17.1 / Python 3.5 有这个限制？谢谢！

Answer 1

采纳答案by jezrael

You can count NaNin df['att1'], substract 1and then it use as parameter limitsto fillna:

你可以指望NaN的df['att1']，。减去1，然后将其作为参数使用limits到fillna：

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])
print df
   att1
0     1
1     2
2   NaN
3   NaN
4   NaN
5   NaN
6     3

s = df['att1'].isnull().sum() - 1
df['att1'] = df['att1'].fillna('missing', limit=s)
print df
      att1
0        1
1        2
2  missing
3  missing
4  missing
5      NaN
6        3

EDIT:

编辑：

Now it is more complicated.

现在情况更复杂了。

So first set helper column countfor counting consecutives values of column att1by isnull, shift, astypeand cumsum. Then groupbyby this column countand fillna:

因此，第一套辅助柱count，用于计算列的consecutives值att1由isnull，shift，astype和cumsum。然后groupby通过此列count和fillna：

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])
print df

df['count'] = (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
print df
    att1  count
0      1      1
1      2      1
2    NaN      2
3    NaN      2
4    NaN      2
5    NaN      2
6      3      3
7      4      3
8    NaN      4
9    NaN      4
10   NaN      4
11     5      5

def f(x):
    att = x['att1'].isnull()
    if(att.all()):
        return x['att1'].fillna('missing', limit=att.sum() - 1)
    else:
        return x['att1']

print df.groupby(['count']).apply(f).reset_index(drop=True)

0           1
1           2
2     missing
3     missing
4     missing
5         NaN
6           3
7           4
8     missing
9     missing
10        NaN
11          5
Name: att1, dtype: object

Explaining column count:

说明栏count：

print (df['att1'].isnull() != df['att1'].isnull().shift())
0      True
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8      True
9     False
10    False
11     True
Name: att1, dtype: bool

print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int)
0     1
1     0
2     1
3     0
4     0
5     0
6     1
7     0
8     1
9     0
10    0
11    1
Name: att1, dtype: int32

print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
0     1
1     1
2     2
3     2
4     2
5     2
6     3
7     3
8     4
9     4
10    4
11    5
Name: att1, dtype: int32

Answer 2

回答by Woody Pride

An alternative method that is maybe a little less complex would just to be to create a list of index points at which you expect there to be NaNs (where the index point is not null, but the index point before is null). Then you just forward fill your data and reinsert the NaNs using the list you created.

另一种可能不太复杂的替代方法就是创建一个索引点列表，您希望在该列表处有 NaN（其中索引点不为空，但之前的索引点为空）。然后，您只需向前填充数据并使用您创建的列表重新插入 NaN。

import pandas as pd
import numpy as np
from numpy import nan as NA
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

#create list of index points where you want NaNs to be be
Nan_ind = [x - 1 for x in xrange(1, df.index[-1] + 1) 
                if pd.notnull(df.loc[x, 'att1'])
                and pd.isnull(df.loc[x-1, 'att1'])]

#forward fillna             
df['att1'] = df['att1'].fillna(method = 'ffill')

#reinsert NaNs using your list of index points
df.loc[Nan_ind, 'att1'] = NA

Answer 3

回答by Arthur Zennig

fillna all NaN with "missing". The last "missing" you can replace with NaN.

用“缺失”填充所有 NaN。您可以用 NaN 替换最后一个“缺失”。

df['att1'].fillna("missing",inplace=True)
df.iloc[[-2]].replace("missing",NaN)

using negative value for iloc search index backwards. -2 return the value of the forelast element of the 'att1' column.

向后使用 iloc 搜索索引的负值。-2 返回 'att1' 列的前一个元素的值。

Python pandas 仅填充一行具有特定值

提问by ragesz

采纳答案by jezrael

回答by Woody Pride

回答by Arthur Zennig

相关推荐

最近更新

标签

Python pandas 仅填充一行具有特定值

提问by ragesz

采纳答案by jezrael

回答by Woody Pride

回答by Arthur Zennig

相关推荐

pandas 从熊猫数据框中删除闰年

pandas 计算pandas/python中df列中非零数字的数量

pandas 使用 pd.read_csv 时无法删除标题

pandas 如何在合并熊猫数据框中的两列时删除 nan 值？

相关推荐

最近更新

标签