pandas 如何使用groupby插入缺失值？

Question

提问by John Stud

I cannot get missing values to interpolate correctly when I use the groupby function.

当我使用 groupby 函数时，我无法正确插入缺失值。

Here is a quick example of what I have tried:

这是我尝试过的快速示例：

import pandas as pd
import numpy as np

# Create data
state = pd.Series(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
population = pd.Series([100, 150, np.nan, np.nan, 50, 125, np.nan, np.nan])
year = [2016, 2017, 2018, 2019, 2016, 2017, 2018, 2019]
dict = {'state': state, 'population': population, 'year': year}  
df = pd.DataFrame(dict) 

# Interpolate population, grouped by states
df.groupby('state').apply(lambda x: x.interpolate(method='linear')) 

  state  population  year
0     A       100.0  2016
1     A       150.0  2017
2     A       150.0  2018
3     A       150.0  2019
4     B        50.0  2016
5     B       125.0  2017
6     B       125.0  2018
7     B       125.0  2019

As you notice, when grouping by state, it is simply repeating the last value.

如您所见，按分组时state，它只是重复最后一个值。

Answer 1

回答by YOBEN_S

And base on what you need , pass the method spline

并根据您的需要，传递方法 spline

df.groupby('state')['population'].apply(lambda x : x.interpolate(method = "spline", order = 1, limit_direction = "both"))
0    100.0
1    150.0
2    200.0
3    250.0
4     50.0
5    125.0
6    200.0
7    275.0
Name: population, dtype: float64

pandas 如何使用groupby插入缺失值？

提问by John Stud

回答by YOBEN_S

相关推荐

最近更新

标签

pandas 如何使用groupby插入缺失值？

提问by John Stud

回答by YOBEN_S

相关推荐

具有不同列的 Pandas 连接数据帧：AttributeError: 'NoneType' 对象没有属性 'is_extension'

pandas 在 Python 中循环遍历数据帧的更优雅方式

从 Pandas DataFrame 创建 Spark DataFrame

仅水平网格（在 python 中使用 Pandas plot + pyplot）

相关推荐

最近更新

标签