pandas 用之前的非缺失值填充缺失的pandas数据，按key分组

Question

提问by ChrisB

I am dealing with pandas DataFrames like this:

我正在处理这样的Pandas数据帧：

I would like to replace each NAN 'x' with the previous non-NAN 'x' from a row with the same 'id' value:

我想用具有相同“id”值的行中的前一个非 NAN 'x' 替换每个 NAN 'x'：

Is there some slick way to do this without manually looping over rows?

有没有一些巧妙的方法来做到这一点而无需手动循环遍历行？

Answer 1

回答by unutbu

You could perform a groupby/forward-filloperation on each group:

您可以对每个组执行groupby/forward-fill操作：

import numpy as np
import pandas as pd

df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)

yields

产量

   id      x
0   1   10.0
1   1   20.0
2   2  100.0
3   2  200.0
4   1   20.0
5   2  200.0
6   1  300.0
7   1  300.0

Answer 2

回答by S_Ymln

df
   id   val
0   1   23.0
1   1   NaN
2   1   NaN
3   2   NaN
4   2   34.0
5   2   NaN
6   3   2.0
7   3   NaN
8   3   NaN

df.sort_values(['id','val']).groupby('id').ffill()

    id  val
0   1   23.0
1   1   23.0
2   1   23.0
4   2   34.0
3   2   34.0
5   2   34.0
6   3   2.0
7   3   2.0
8   3   2.0

use sort_values, groupby and ffill so that if you have Nanvalue for the first value or set of first values they also get filled.

使用 sort_values、groupby 和 ffill，这样如果您有Nan第一个值或第一个值集的值，它们也会被填充。

Answer 3

回答by Renel Chesak

Solution for multi-key problem:

多键问题的解决方法：

In this example, the data has the key [date, region, type]. Date is the index on the original dataframe.

在此示例中，数据具有键 [日期、地区、类型]。日期是原始数据帧上的索引。

import os
import pandas as pd

#sort to make indexing faster
df.sort_values(by=['date','region','type'], inplace=True)

#collect all possible regions and types
regions = list(set(df['region']))
types = list(set(df['type']))

#record column names
df_cols = df.columns

#delete ffill_df.csv so we can begin anew
try:
    os.remove('ffill_df.csv')
except FileNotFoundError:
    pass

# steps:
# 1) grab rows with a particular region and type
# 2) use forwardfill to fill nulls
# 3) use backwardfill to fill remaining nulls
# 4) append to file
for r in regions:
    for t in types:
        group_df = df[(df.region == r) & (df.type == t)].copy()
        group_df.fillna(method='ffill', inplace=True)
        group_df.fillna(method='bfill', inplace=True)
        group_df.to_csv('ffill_df.csv', mode='a', header=False, index=True)

Checking the result:

检查结果：

#load in the ffill_df
ffill_df = pd.read_csv('ffill_df.csv', header=None, index_col=None)
ffill_df.columns = df_reindexed_cols
ffill_df.index= ffill_df.date
ffill_df.drop('date', axis=1, inplace=True)
ffill_df.head()

#compare new and old dataframe
print(df.shape)        
print(ffill_df.shape)
print()
print(pd.isnull(ffill_df).sum())

pandas 用之前的非缺失值填充缺失的pandas数据，按key分组

提问by ChrisB

回答by unutbu

回答by S_Ymln

回答by Renel Chesak

Solution for multi-key problem:

多键问题的解决方法：

相关推荐

最近更新

标签

pandas 用之前的非缺失值填充缺失的pandas数据，按key分组

提问by ChrisB

回答by unutbu

回答by S_Ymln

回答by Renel Chesak

Solution for multi-key problem:

多键问题的解决方法：

相关推荐

向 Pandas 数据框添加具有特定项目值的新列？

将 Pandas MultiIndex DataFrame 从按行转换为按列

在 Pandas 中使用向量获取数据帧的点积，并返回数据帧

在 group() 上的 Pandas 中使用 cumsum

相关推荐

最近更新

标签