Pandas:DataFrame groupby for year/month 并返回新的 DatetimeIndex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35488908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:43:34  来源:igfitidea点击:

Pandas: DataFrame groupby for year/month and return with new DatetimeIndex

pythonpandasdatetimeindex

提问by dirk

I need some directions in grouping a Pandas DateFrameobject by year or month and get in return an new DateFrameobject with a new index. Here is my code so far. groupbyworks as intended.

我需要一些指导来DateFrame按年或月对 Pandas对象进行分组,并返回一个DateFrame具有新索引的新对象。到目前为止,这是我的代码。groupby按预期工作。

Load data from .csv file, parse 'Date' to date format (historical stock quotes from finance.yahoo.com)

从 .csv 文件加载数据,将“日期”解析为日期格式(来自 Finance.yahoo.com 的历史股票报价)

In [23]: import pandas as pd
         file = pd.read_csv("sdf.de.csv", parse_dates=['Date'])
         file.head(2)

Out[23]:
    Date        Open    High    Low     Close   Volume  Adj Close
0   2016-02-16  18.650  18.70   17.940  18.16   1720800 17.0600
1   2016-02-15  18.295  18.64   18.065  18.50   1463500 17.3794

sort file for 'Date' ascending and set index to Date

按“日期”升序对文件进行排序并将索引设置为 Date

In [24]: daily = file.sort_values(by='Date').set_index('Date')
         daily.head()

Out[24]:
            Open    High    Low     Close   Volume  Adj Close
Date                        
2000-01-03  14.20   14.50   14.15   14.40   277400  2.7916
2000-01-04  14.29   14.30   13.90   14.15   109200  2.7431

grouping for month

按月分组

I would do an additional apply()to the groups, which would condense the data for the specific group, e.g. find the highest Highvalue for the year/month or sum()the Volumevalues. This step is omitted for this example.

我会做一个额外apply()的组,这将压缩数据的特定群体,如发现最高的High为年/月或值sum()Volume值。本例省略此步骤。

In [39]: monthly = daily.groupby(lambda x: (x.year, x.month))
         monthly.first()

Out[39]:
            Open    High    Low     Close   Volume  Adj Close
(2000, 1)   14.200  14.500  14.150  14.400  277400  2.7916
(2000, 2)   13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
(2016, 1)   23.620  23.620  23.620  23.620  0       22.1893
(2016, 2)   19.575  19.630  19.140  19.450  1783000 18.2719

This works, but it gives me a DateFrameobject with a tuple as index.

这有效,但它给了我一个DateFrame带有元组作为索引的对象。

The desired result, in this case for grouping for month, would be a complete new DataFrameobject, but the Dateindex should be a new DatetimeIndexin the form of %Y-%mor just %Yif grouped by year.

在这种情况下,对于按月份分组的预期结果将是一个全新的DataFrame对象,但Date索引应该是新DatetimeIndex的,形式为%Y-%m或仅%Y按年份分组。

Out[39]:
        Open    High    Low     Close   Volume  Adj Close
Date
2000-01 14.200  14.500  14.150  14.400  277400  2.7916
2000-02 13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
2016-01 23.620  23.620  23.620  23.620  0       22.1893
2016-02 19.575  19.630  19.140  19.450  1783000 18.2719

I'm thankful for any directions.

我很感激任何指示。

采纳答案by jezrael

You can use groupbywith daily.index.year, daily.index.monthor change indexto_periodand then groupbyby index:

您可以使用groupbywithdaily.index.year, daily.index.month或更改indexto_period然后groupby使用index

print daily
              Open   High    Low  Close   Volume  Adj Close
Date                                                       
2000-01-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02-01  13.900  14.39  13.90  14.25   287200     2.7625
2016-01-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02-01  19.575  19.63  19.14  19.45  1783000    18.2719

print daily.groupby([daily.index.year, daily.index.month]).first()
          Open   High    Low  Close   Volume  Adj Close
2000 1  14.200  14.50  14.15  14.40   277400     2.7916
     2  13.900  14.39  13.90  14.25   287200     2.7625
2016 1  23.620  23.62  23.62  23.62        0    22.1893
     2  19.575  19.63  19.14  19.45  1783000    18.2719

daily.index = daily.index.to_period('M')
print daily.groupby(daily.index).first()
           Open   High    Low  Close   Volume  Adj Close
Date                                                    
2000-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02  13.900  14.39  13.90  14.25   287200     2.7625
2016-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02  19.575  19.63  19.14  19.45  1783000    18.2719

回答by Alexander

You can use a list comprehension to access the year and month accessor variable from your timestamps and then group on those.

您可以使用列表推导从时间戳访问年和月访问器变量,然后对它们进行分组。

>>> df.groupby([[d.year for d in df.Date], [d.month for d in df.Date]]).first()
             Date    Open   High    Low  Close   Volume  Adj_Close
2000 1 2000-01-01  14.200  14.50  14.15  14.40   277400     2.7916
     2 2000-02-01  13.900  14.39  13.90  14.25   287200     2.7625
2016 1 2016-01-01  23.620  23.62  23.62  23.62        0    22.1893
     2 2016-02-01  19.575  19.63  19.14  19.45  1783000    18.2719