pandas 如何使用groupby插入缺失值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/55718026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:22:51 来源:igfitidea点击:
How to interpolate missing values with groupby?
提问by John Stud
I cannot get missing values to interpolate correctly when I use the groupby function.
当我使用 groupby 函数时,我无法正确插入缺失值。
Here is a quick example of what I have tried:
这是我尝试过的快速示例:
import pandas as pd
import numpy as np
# Create data
state = pd.Series(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
population = pd.Series([100, 150, np.nan, np.nan, 50, 125, np.nan, np.nan])
year = [2016, 2017, 2018, 2019, 2016, 2017, 2018, 2019]
dict = {'state': state, 'population': population, 'year': year}
df = pd.DataFrame(dict)
# Interpolate population, grouped by states
df.groupby('state').apply(lambda x: x.interpolate(method='linear'))
state population year
0 A 100.0 2016
1 A 150.0 2017
2 A 150.0 2018
3 A 150.0 2019
4 B 50.0 2016
5 B 125.0 2017
6 B 125.0 2018
7 B 125.0 2019
As you notice, when grouping by state
, it is simply repeating the last value.
如您所见,按 分组时state
,它只是重复最后一个值。
回答by YOBEN_S
And base on what you need , pass the method spline
并根据您的需要,传递方法 spline
df.groupby('state')['population'].apply(lambda x : x.interpolate(method = "spline", order = 1, limit_direction = "both"))
0 100.0
1 150.0
2 200.0
3 250.0
4 50.0
5 125.0
6 200.0
7 275.0
Name: population, dtype: float64