Python pandas 仅填充一行具有特定值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35178117/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas fillna only one row with specific value
提问by ragesz
EDITED:
编辑:
I have (not a very simple) a dataframe:
我有(不是很简单)一个数据框:
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
, np.nan, np.nan, np.nan, 5], columns=['att1'])
att1
0 1.0000
1 2.0000
2 nan
3 nan
4 nan
5 nan
6 3.0000
7 4.0000
8 nan
9 nan
10 nan
11 5.0000
I want fill NAN
values with the previous not NAN
value except the last NAN
value. I want the last NAN
value to be NAN
after filling. How can I do that?
我想NAN
用NAN
除最后一个NAN
值之外的前一个非值填充值。我希望填充后的最后一个NAN
值NAN
。我怎样才能做到这一点?
I want this result:
我想要这个结果:
att1
0 1.0000
1 2.0000
2 2.0000
3 2.0000
4 2.0000
5 nan
6 3.0000
7 4.0000
8 4.0000
9 4.0000
10 nan
11 5.0000
I tried this:
我试过这个:
df = df.fillna(value='missing', method='bfill', limit=1)
df = df.fillna(method='ffill')
But the first row gives this error:
但是第一行给出了这个错误:
ValueError: cannot specify both a fill method and value
Why there is this limitation in pandas 0.17.1 / Python 3.5? Thank you!
为什么pandas 0.17.1 / Python 3.5 有这个限制?谢谢!
采纳答案by jezrael
You can count NaN
in df['att1']
, substract 1
and then it use as parameter limits
to fillna
:
你可以指望NaN
的df['att1']
,。减去1
,然后将其作为参数使用limits
到fillna
:
import pandas as pd
import numpy as np
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])
print df
att1
0 1
1 2
2 NaN
3 NaN
4 NaN
5 NaN
6 3
s = df['att1'].isnull().sum() - 1
df['att1'] = df['att1'].fillna('missing', limit=s)
print df
att1
0 1
1 2
2 missing
3 missing
4 missing
5 NaN
6 3
EDIT:
编辑:
Now it is more complicated.
现在情况更复杂了。
So first set helper column count
for counting consecutives values of column att1
by isnull
, shift
, astype
and cumsum
. Then groupby
by this column count
and fillna
:
因此,第一套辅助柱count
,用于计算列的consecutives值att1
由isnull
,shift
,astype
和cumsum
。然后groupby
通过此列count
和fillna
:
import pandas as pd
import numpy as np
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
, np.nan, np.nan, np.nan, 5], columns=['att1'])
print df
df['count'] = (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
print df
att1 count
0 1 1
1 2 1
2 NaN 2
3 NaN 2
4 NaN 2
5 NaN 2
6 3 3
7 4 3
8 NaN 4
9 NaN 4
10 NaN 4
11 5 5
def f(x):
att = x['att1'].isnull()
if(att.all()):
return x['att1'].fillna('missing', limit=att.sum() - 1)
else:
return x['att1']
print df.groupby(['count']).apply(f).reset_index(drop=True)
0 1
1 2
2 missing
3 missing
4 missing
5 NaN
6 3
7 4
8 missing
9 missing
10 NaN
11 5
Name: att1, dtype: object
Explaining column count
:
说明栏count
:
print (df['att1'].isnull() != df['att1'].isnull().shift())
0 True
1 False
2 True
3 False
4 False
5 False
6 True
7 False
8 True
9 False
10 False
11 True
Name: att1, dtype: bool
print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int)
0 1
1 0
2 1
3 0
4 0
5 0
6 1
7 0
8 1
9 0
10 0
11 1
Name: att1, dtype: int32
print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
0 1
1 1
2 2
3 2
4 2
5 2
6 3
7 3
8 4
9 4
10 4
11 5
Name: att1, dtype: int32
回答by Woody Pride
An alternative method that is maybe a little less complex would just to be to create a list of index points at which you expect there to be NaNs (where the index point is not null, but the index point before is null). Then you just forward fill your data and reinsert the NaNs using the list you created.
另一种可能不太复杂的替代方法就是创建一个索引点列表,您希望在该列表处有 NaN(其中索引点不为空,但之前的索引点为空)。然后,您只需向前填充数据并使用您创建的列表重新插入 NaN。
import pandas as pd
import numpy as np
from numpy import nan as NA
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
, np.nan, np.nan, np.nan, 5], columns=['att1'])
#create list of index points where you want NaNs to be be
Nan_ind = [x - 1 for x in xrange(1, df.index[-1] + 1)
if pd.notnull(df.loc[x, 'att1'])
and pd.isnull(df.loc[x-1, 'att1'])]
#forward fillna
df['att1'] = df['att1'].fillna(method = 'ffill')
#reinsert NaNs using your list of index points
df.loc[Nan_ind, 'att1'] = NA
回答by Arthur Zennig
fillna all NaN with "missing". The last "missing" you can replace with NaN.
用“缺失”填充所有 NaN。您可以用 NaN 替换最后一个“缺失”。
df['att1'].fillna("missing",inplace=True)
df.iloc[[-2]].replace("missing",NaN)
using negative value for iloc search index backwards. -2 return the value of the forelast element of the 'att1' column.
向后使用 iloc 搜索索引的负值。-2 返回 'att1' 列的前一个元素的值。