Pandas:groupby 向前填充日期时间索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38597253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:40:38  来源:igfitidea点击:

Pandas: groupby forward fill with datetime index

pythondatetimepandasgroup-bymissing-data

提问by sapo_cosmico

I have a dataset that has two columns: company, and value.
It has a datetime index, which contains duplicates (on the same day, different companies have different values). The values have missing data, so I want to forward fill the missing data with the previous datapoint from the same company.

我有一个包含两列的数据集:公司和价值。
它有一个日期时间索引,其中包含重复项(同一天,不同的公司有不同的值)。这些值缺少数据,所以我想用来自同一家公司的以前的数据点向前填充缺失的数据。

However, I can't seem to find a good way to do this without running into odd groupby errors, suggesting that I'm doing something wrong.

但是,我似乎无法找到一种很好的方法来做到这一点,而不会遇到奇怪的 groupby 错误,这表明我做错了什么。

Toy data:

玩具数据:

a = pd.DataFrame({'a': [1, 2, None], 'b': [12,None,14]})
a.index = pd.DatetimeIndex(['2010', '2011', '2012'])  
a = a.unstack() 
a = a.reset_index().set_index('level_1') 
a.columns = ['company', 'value'] 
a.sort_index(inplace=True)

Attempted solutions (didn't work: ValueError: cannot reindex from a duplicate axis):

尝试的解决方案(不起作用:)ValueError: cannot reindex from a duplicate axis

a.groupby('company').ffill() 
a.groupby('company')['value'].ffill() 
a.groupby('company').fillna(method='ffill')

Hacky solution (that delivers the desired result, but is obviously just an ugly workaround):

Hacky 解决方案(提供所需的结果,但显然只是一个丑陋的解决方法):

a['value'] = a.reset_index().groupby(
    'company').fillna(method='ffill')['value'].values

There is probably a simple and elegant way to do this, how is this performed in Pandas?

可能有一种简单而优雅的方法可以做到这一点,这在 Pandas 中是如何执行的?

回答by Psidom

One way is to use the transformfunction to fill the valuecolumn after group by:

一种方法是使用该transform函数value在分组后填充列:

import pandas as pd
a['value'] = a.groupby('company')['value'].transform(lambda v: v.ffill())

a
#          company  value
#level_1        
#2010-01-01      a    1.0
#2010-01-01      b   12.0
#2011-01-01      a    2.0
#2011-01-01      b   12.0
#2012-01-01      a    2.0
#2012-01-01      b   14.0

To compare, the original data frame looks like:

为了比较,原始数据框如下所示:

#            company    value
#level_1        
#2010-01-01        a      1.0
#2010-01-01        b     12.0
#2011-01-01        a      2.0
#2011-01-01        b      NaN
#2012-01-01        a      NaN
#2012-01-01        b     14.0

回答by root

You can add 'company'to the index, making it unique, and do a simple ffillvia groupby:

您可以添加'company'到索引中,使其唯一,并ffill通过groupby以下方式执行简单操作:

a = a.set_index('company', append=True)
a = a.groupby(level=1).ffill()

From here, you can use reset_indexto revert the index back to the just the date, if necessary. I'd recommend keeping 'company'as part of the the index (or just adding it to the index to begin with), so your index remains unique:

reset_index如有必要,您可以从这里将索引恢复为日期。我建议保留'company'作为索引的一部分(或者只是将其添加到索引中),这样您的索引就保持唯一:

a = a.reset_index(level=1)

回答by piRSquared

I like to use stacking and unstacking. In this case, it requires that I append the index with 'company'.

我喜欢使用堆叠和拆垛。在这种情况下,它要求我在索引后附加'company'.

a.set_index('company', append=True).unstack().ffill() \
                                   .stack().reset_index('company')

enter image description here

在此处输入图片说明



Timing

定时

Conclusion@Psidom's solution works best under both scenarios.

结论@Psidom 的解决方案在这两种情况下都效果最好。

toy data

玩具数据

enter image description here

在此处输入图片说明

bigger toy

更大的玩具

np.random.seed([3,1415])
n = 10000
a = pd.DataFrame(np.random.randn(n, 10),
                 pd.date_range('2014-01-01', periods=n, freq='H', name='Time'),
                 pd.Index(list('abcdefghij'), name='company'))

a *= np.random.choice((1, np.nan), (n, 10), p=(.6, .4))

a = a.stack(dropna=False).rename('value').reset_index('company')

enter image description here

在此处输入图片说明