pandas 如何使用pandas在时间序列中查找连续的相同数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26911851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:39:48  来源:igfitidea点击:

How to use pandas to find consecutive same data in time series

pythonpandasapply

提问by figo

Here is a time series data like this,call it df:

这是一个像这样的时间序列数据,称之为 df:

      'No'       'Date'       'Value'
0     600000     1999-11-10    1
1     600000     1999-11-11    1
2     600000     1999-11-12    1
3     600000     1999-11-15    1
4     600000     1999-11-16    1
5     600000     1999-11-17    1
6     600000     1999-11-18    0
7     600000     1999-11-19    1
8     600000     1999-11-22    1
9     600000     1999-11-23    1
10    600000     1999-11-24    1
11    600000     1999-11-25    0
12    600001     1999-11-26    1
13    600001     1999-11-29    1
14    600001     1999-11-30    0

I want to get the date range of the consecutive 'Value' of 1, so how can I get the final result as follows:

我想获取连续“值”为 1 的日期范围,那么如何获得最终结果如下:

   'No'     'BeginDate'    'EndDate'   'Consecutive'
0 600000    1999-11-10    1999-11-17    6
1 600000    1999-11-19    1999-11-24    4
2 600001    1999-11-26    1999-11-29    2

回答by user1827356

This should do it

这应该做

df['value_grp'] = (df.Values.diff(1) != 0).astype('int').cumsum()

value_grp will increment by one whenever Value changes. Below, you can extract the group results

每当 Value 更改时,value_grp 将增加 1。下面,您可以提取分组结果

pd.DataFrame({'BeginDate' : df.groupby('value_grp').Date.first(), 
              'EndDate' : df.groupby('value_grp').Date.last(),
              'Consecutive' : df.groupby('value_grp').size(), 
              'No' : df.groupby('value_grp').No.first()}).reset_index(drop=True)

回答by MaxU

Here is an alternative solution:

这是一个替代解决方案:

rslt = (df.assign(Consecutive=df.Value
                                .groupby((df.Value != df.Value.shift())
                                         .cumsum())
                                .transform('size'))
          .query('Consecutive > 1')
          .groupby('Consecutive')
          .agg({'No':{'No':'first'}, 'Date': {'BeginDate':'first', 'EndDate':'last'}})
          .reset_index()
)
rslt.columns = [t[1] if t[1] else t[0] for t in rslt.columns]

Demo:

演示:

In [225]: %paste
rslt = (df.assign(Consecutive=df.Value
                                .groupby((df.Value != df.Value.shift())
                                         .cumsum())
                                .transform('size'))
          .query('Consecutive > 1')
          .groupby('Consecutive')
          .agg({'No':{'No':'first'}, 'Date': {'BeginDate':'first', 'EndDate':'last'}})
          .reset_index()
)
rslt.columns = [t[1] if t[1] else t[0] for t in rslt.columns]
## -- End pasted text --

In [226]: rslt
Out[226]:
   Consecutive  BeginDate    EndDate      No
0            2 1999-11-26 1999-11-29  600001
1            4 1999-11-19 1999-11-24  600000
2            6 1999-11-10 1999-11-17  600000