pandas 如何使用groupby插入缺失值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/55718026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:22:51  来源:igfitidea点击:

How to interpolate missing values with groupby?

pythonpandaspandas-groupby

提问by John Stud

I cannot get missing values to interpolate correctly when I use the groupby function.

当我使用 groupby 函数时,我无法正确插入缺失值。

Here is a quick example of what I have tried:

这是我尝试过的快速示例:

import pandas as pd
import numpy as np

# Create data
state = pd.Series(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
population = pd.Series([100, 150, np.nan, np.nan, 50, 125, np.nan, np.nan])
year = [2016, 2017, 2018, 2019, 2016, 2017, 2018, 2019]
dict = {'state': state, 'population': population, 'year': year}  
df = pd.DataFrame(dict) 

# Interpolate population, grouped by states
df.groupby('state').apply(lambda x: x.interpolate(method='linear')) 

  state  population  year
0     A       100.0  2016
1     A       150.0  2017
2     A       150.0  2018
3     A       150.0  2019
4     B        50.0  2016
5     B       125.0  2017
6     B       125.0  2018
7     B       125.0  2019

As you notice, when grouping by state, it is simply repeating the last value.

如您所见,按 分组时state,它只是重复最后一个值。

回答by YOBEN_S

And base on what you need , pass the method spline

并根据您的需要,传递方法 spline

df.groupby('state')['population'].apply(lambda x : x.interpolate(method = "spline", order = 1, limit_direction = "both"))
0    100.0
1    150.0
2    200.0
3    250.0
4     50.0
5    125.0
6    200.0
7    275.0
Name: population, dtype: float64