pandas 按新的日期范围重新索引数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27421256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:45:29  来源:igfitidea点击:

Re-index dataframe by new range of dates

pythonpandasdate-rangereindex

提问by Gianluca

I have a data frame containing a number of observations:

我有一个包含许多观察结果的数据框:

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-20   yellow     3

I would like to re-index the data frame and standardise the dates.

我想重新索引数据框并标准化日期。

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-22   red        NaN
2014-10-20   yellow     3
2014-10-21   yellow     NaN
2014-10-22   yellow     NaN

I though to order the data frame by colourand date, and then try to re-index it.

我想通过colour和对数据框进行排序date,然后尝试重新索引它。

index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts

But it returns a new data frame with the right index but all NaNvalues.

但它返回一个具有正确索引但所有NaN值的新数据框。

date         colour     orders
2014-10-20   NaN        NaN
2014-10-21   NaN        NaN
2014-10-22   NaN        NaN

回答by joris

Starting from your exampe dataframe:

从您的示例数据框开始:

In [51]: df
Out[51]:
        date  colour  orders
0 2014-10-20     red       7
1 2014-10-21     red      10
2 2014-10-20  yellow       3

If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):

如果要重新索引“日期”和“颜色”,一种可能性是将两者都设置为索引(多索引):

In [52]: df = df.set_index(['date', 'colour'])

In [53]: df
Out[53]:
                   orders
date       colour
2014-10-20 red          7
2014-10-21 red         10
2014-10-20 yellow       3

You can now reindex this dataframe, after you constructed to desired index:

在构建到所需的索引后,您现在可以重新索引此数据框:

In [54]: index = pd.date_range('20/10/2014', '22/10/2014')

In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])

In [56]: df.reindex(multi_index)
Out[56]:
                   orders
2014-10-20 red          7
           yellow       3
2014-10-21 red         10
           yellow     NaN
2014-10-22 red        NaN
           yellow     NaN

To have the same output as your example output, the index should be sorted in the second level (level=1as it is 0-based):

要获得与示例输出相同的输出,索引应在第二级排序(level=1因为它是基于 0 的):

In [60]: df2 = df.reindex(multi_index)

In [64]: df2.sortlevel(level=1)
Out[64]:
                   orders
2014-10-20 red          7
2014-10-21 red         10
2014-10-22 red        NaN
2014-10-20 yellow       3
2014-10-21 yellow     NaN
2014-10-22 yellow     NaN

A possible way to generate the multi-index automatically would be (with your original frame):

自动生成多索引的一种可能方法是(使用您的原始框架):

pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'), 
                            df['colour'].unique()])


Another waywould be to use resamplefor each group of colors:

另一种方法resample用于每组颜色:

In [77]: df = df.set_index('date')

In [78]: df.groupby('colour').resample('D')

This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.

这更简单,但这不会为您提供每种颜色的完整日期范围,只会提供该颜色组可用的日期范围。