Pandas 填充组内缺失的日期和值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44978196/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:56:48  来源:igfitidea点击:

Pandas filling missing dates and values within group

pythonpandasdataframe

提问by broccoli

I've a data frame that looks like the following

我有一个如下所示的数据框

x = pd.DataFrame({'user': ['a','a','b','b'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06'], 'val': [1,33,2,1]})

What I would like to be able to do is find the minimum and maximum date within the date column and expand that column to have all the dates there while simultaneously filling in 0for the valcolumn. So the desired output is

我希望能够做的是在日期列中找到最小和最大日期,并扩展该列以包含所有日期,同时填写0val列。所以期望的输出是

            dt user  val
0   2016-01-01    a    1
1   2016-01-02    a   33
2   2016-01-03    a    0
3   2016-01-04    a    0
4   2016-01-05    a    0
5   2016-01-06    a    0
6   2016-01-01    b    0
7   2016-01-02    b    0
8   2016-01-03    b    0
9   2016-01-04    b    0
10  2016-01-05    b    2
11  2016-01-06    b    1

I've tried the solution mentioned hereand herebut they aren't what I'm after. Any pointers much appreciated.

我已经尝试过这里这里提到的解决方案,但它们不是我所追求的。任何指针都非常感谢。

回答by ayhan

Initial Dataframe:

初始数据帧:

            dt  user    val
0   2016-01-01     a      1
1   2016-01-02     a     33
2   2016-01-05     b      2
3   2016-01-06     b      1

First, convert the dates to datetime:

首先,将日期转换为日期时间:

x['dt'] = pd.to_datetime(x['dt'])

Then, generate the dates and unique users:

然后,生成日期和唯一用户:

dates = x.set_index('dt').resample('D').asfreq().index

>> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', name='dt', freq='D')

users = x['user'].unique()

>> array(['a', 'b'], dtype=object)

This will allow you to create a MultiIndex:

这将允许您创建一个 MultiIndex:

idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])

>> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
           names=['dt', 'user'])

You can use that to reindex your DataFrame:

您可以使用它来重新索引您的 DataFrame:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
Out: 
           dt user  val
0  2016-01-01    a    1
1  2016-01-01    b    0
2  2016-01-02    a   33
3  2016-01-02    b    0
4  2016-01-03    a    0
5  2016-01-03    b    0
6  2016-01-04    a    0
7  2016-01-04    b    0
8  2016-01-05    a    0
9  2016-01-05    b    2
10 2016-01-06    a    0
11 2016-01-06    b    1

which then can be sorted by users:

然后可以按用户排序:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user')
Out: 
           dt user  val
0  2016-01-01    a    1
2  2016-01-02    a   33
4  2016-01-03    a    0
6  2016-01-04    a    0
8  2016-01-05    a    0
10 2016-01-06    a    0
1  2016-01-01    b    0
3  2016-01-02    b    0
5  2016-01-03    b    0
7  2016-01-04    b    0
9  2016-01-05    b    2
11 2016-01-06    b    1

回答by piRSquared

As @ayhan suggests

正如@ayhan 所建议的那样

x.dt = pd.to_datetime(x.dt)

One-liner using mostly @ayhan's ideas while incorporating stack/unstackand fill_value

单行主要使用@ayhan 的想法,同时结合stack/unstackfill_value

x.set_index(
    ['dt', 'user']
).unstack(
    fill_value=0
).asfreq(
    'D', fill_value=0
).stack().sort_index(level=1).reset_index()

           dt user  val
0  2016-01-01    a    1
1  2016-01-02    a   33
2  2016-01-03    a    0
3  2016-01-04    a    0
4  2016-01-05    a    0
5  2016-01-06    a    0
6  2016-01-01    b    0
7  2016-01-02    b    0
8  2016-01-03    b    0
9  2016-01-04    b    0
10 2016-01-05    b    2
11 2016-01-06    b    1