pandas 基于组和前一行pandas的前向填充(ffill)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48092427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
forward fill (ffill) based on group and previous row pandas
提问by A H
I have a large dataframe (400,000+ rows), that looks like this:
我有一个大数据框(400,000+ 行),看起来像这样:
data = np.array([
[1949, '01/01/2018', np.nan, 17, '30/11/2017'],
[1949, '01/01/2018', np.nan, 19, np.nan],
[1811, '01/01/2018', 16, np.nan, '31/11/2017'],
[1949, '01/01/2018', 15, 21, '01/12/2017'],
[1949, '01/01/2018', np.nan, 20, np.nan],
[3212, '01/01/2018', 21, 17, '31/11/2017']
])
columns = ['id', 'ReceivedDate', 'PropertyType', 'MeterType', 'VisitDate']
pd.DataFrame(data, columns=columns)
Resultant df:
结果df:
id ReceivedDate PropertyType MeterType VisitDate
0 1949 01/01/2018 NaN 17 30/11/2017
1 1949 01/01/2018 NaN 19 NaN
2 1811 01/01/2018 16 NaN 31/11/2017
3 1949 01/01/2018 15 21 01/12/2017
4 1949 01/01/2018 NaN 20 NaN
5 3212 01/01/2018 21 17 31/11/2017
I want to forward fill based on groupby (id & received date) - ONLY IF they come next in order in the index (i.e. only forward fill index positions 1 and 4).
我想根据 groupby(id 和接收日期)转发填充 - 仅当它们在索引中按顺序排在下一个时(即仅向前填充索引位置 1 和 4)。
I am thinking to have a column that says if it should be ffilled or not based on the criteria, but how can I check the row above?
我想有一列说是否应该根据标准填充或不填充,但我如何检查上面的行?
(I plan on using a solution along the lines of this answer: pandas fill forward performance issue
(我计划按照这个答案使用解决方案:pandas fill forward performance issue
df.isnull().astype(int)).groupby(level=0).cumsum().applymap(lambda x: None if x == 0 else 1)
df.isnull().astype(int)).groupby(level=0).cumsum().applymap(lambda x: None if x == 0 else 1)
as x = df.groupby(['id','ReceivedDate']).ffill()
is very slow.)
因为x = df.groupby(['id','ReceivedDate']).ffill()
很慢。)
Desired df:
所需的 df:
id ReceivedDate PropertyType MeterType VisitDate
0 1949 01/01/2018 NaN 17 30/11/2017
1 1949 01/01/2018 NaN 19 30/11/2017
2 1811 01/01/2018 16 NaN 31/11/2017
3 1949 01/01/2018 15 21 01/12/2017
4 1949 01/01/2018 15 20 01/12/2017
5 3212 01/01/2018 21 17 31/11/2017
回答by cs95
groupby
and ffill
with limit=1
groupby
并ffill
与limit=1
df.groupby(['id', 'ReceivedDate']).ffill(limit=1)
id ReceivedDate PropertyType MeterType VisitDate
0 1949 01/01/2018 NaN 17 30/11/2017
1 1949 01/01/2018 NaN 19 30/11/2017
2 1811 01/01/2018 16 18 31/11/2017
3 1949 01/01/2018 15 21 01/12/2017
4 1949 01/01/2018 15 20 01/12/2017
5 3212 01/01/2018 21 17 31/11/2017
groupby
with mask
ing and shift
groupby
与mask
ing 和shift
Try filling NaNs with groupby
, mask
, and shift
-
尝试用填充的NaN groupby
,mask
和shift
-
i = df[['id', 'ReceivedDate']]
j = i.ne(i.shift().values).any(1).cumsum()
df.mask(df.isnull().astype(int).groupby(j).cumsum().eq(1), df.groupby(j).shift())
Or,
或者,
df.where(df.isnull().astype(int).groupby(j).cumsum().ne(1), df.groupby(j).shift())
id ReceivedDate PropertyType MeterType VisitDate
0 1949 01/01/2018 NaN 17 30/11/2017
1 1949 01/01/2018 NaN 19 30/11/2017
2 1811 01/01/2018 16 18 31/11/2017
3 1949 01/01/2018 15 21 01/12/2017
4 1949 01/01/2018 15 20 01/12/2017
5 3212 01/01/2018 21 17 31/11/2017
回答by A H
cols_to_ffill = ['PropertyType', 'VisitDate']
i = df.copy()
newdata = pd.DataFrame(['placeholder'] )
while not newdata.index.empty:
RowAboveid = i.id.shift()
RowAboveRD = i.ReceivedDate.shift()
rows_with_cols_to_ffill_all_empty = i.loc[:, cols_to_ffill].isnull().all(axis=1)
rows_to_ffill = (i.ReceivedDate == RowAboveRD) & (i.id == RowAboveid) & (rows_with_cols_to_ffill_all_empty)
rows_used_to_fill = i[rows_to_ffill].index-1
newdata = i.loc[rows_used_to_fill, cols_to_ffill]
newdata.index +=1
i.loc[rows_to_ffill, cols_to_ffill] = newdata
Keeps looping until no more matches (i.e. all columns are forward filled.)
继续循环直到不再匹配(即所有列都向前填充。)