Pandas:DataFrame groupby for year/month 并返回新的 DatetimeIndex
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35488908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: DataFrame groupby for year/month and return with new DatetimeIndex
提问by dirk
I need some directions in grouping a Pandas DateFrame
object by year or month and get in return an new DateFrame
object with a new index.
Here is my code so far. groupby
works as intended.
我需要一些指导来DateFrame
按年或月对 Pandas对象进行分组,并返回一个DateFrame
具有新索引的新对象。到目前为止,这是我的代码。groupby
按预期工作。
Load data from .csv file, parse 'Date' to date format (historical stock quotes from finance.yahoo.com)
从 .csv 文件加载数据,将“日期”解析为日期格式(来自 Finance.yahoo.com 的历史股票报价)
In [23]: import pandas as pd
file = pd.read_csv("sdf.de.csv", parse_dates=['Date'])
file.head(2)
Out[23]:
Date Open High Low Close Volume Adj Close
0 2016-02-16 18.650 18.70 17.940 18.16 1720800 17.0600
1 2016-02-15 18.295 18.64 18.065 18.50 1463500 17.3794
sort file for 'Date' ascending and set index to Date
按“日期”升序对文件进行排序并将索引设置为 Date
In [24]: daily = file.sort_values(by='Date').set_index('Date')
daily.head()
Out[24]:
Open High Low Close Volume Adj Close
Date
2000-01-03 14.20 14.50 14.15 14.40 277400 2.7916
2000-01-04 14.29 14.30 13.90 14.15 109200 2.7431
grouping for month
按月分组
I would do an additional apply()
to the groups, which would condense the data for the specific group, e.g. find the highest High
value for the year/month or sum()
the Volume
values. This step is omitted for this example.
我会做一个额外apply()
的组,这将压缩数据的特定群体,如发现最高的High
为年/月或值sum()
的Volume
值。本例省略此步骤。
In [39]: monthly = daily.groupby(lambda x: (x.year, x.month))
monthly.first()
Out[39]:
Open High Low Close Volume Adj Close
(2000, 1) 14.200 14.500 14.150 14.400 277400 2.7916
(2000, 2) 13.900 14.390 13.900 14.250 287200 2.7625
... ... ... ... ... ... ...
(2016, 1) 23.620 23.620 23.620 23.620 0 22.1893
(2016, 2) 19.575 19.630 19.140 19.450 1783000 18.2719
This works, but it gives me a DateFrame
object with a tuple as index.
这有效,但它给了我一个DateFrame
带有元组作为索引的对象。
The desired result, in this case for grouping for month, would be a complete new DataFrame
object, but the Date
index should be a new DatetimeIndex
in the form of %Y-%m
or just %Y
if grouped by year.
在这种情况下,对于按月份分组的预期结果将是一个全新的DataFrame
对象,但Date
索引应该是新DatetimeIndex
的,形式为%Y-%m
或仅%Y
按年份分组。
Out[39]:
Open High Low Close Volume Adj Close
Date
2000-01 14.200 14.500 14.150 14.400 277400 2.7916
2000-02 13.900 14.390 13.900 14.250 287200 2.7625
... ... ... ... ... ... ...
2016-01 23.620 23.620 23.620 23.620 0 22.1893
2016-02 19.575 19.630 19.140 19.450 1783000 18.2719
I'm thankful for any directions.
我很感激任何指示。
采纳答案by jezrael
You can use groupby
with daily.index.year, daily.index.month
or change index
to_period
and then groupby
by index
:
您可以使用groupby
withdaily.index.year, daily.index.month
或更改index
to_period
然后groupby
使用index
:
print daily
Open High Low Close Volume Adj Close
Date
2000-01-01 14.200 14.50 14.15 14.40 277400 2.7916
2000-02-01 13.900 14.39 13.90 14.25 287200 2.7625
2016-01-01 23.620 23.62 23.62 23.62 0 22.1893
2016-02-01 19.575 19.63 19.14 19.45 1783000 18.2719
print daily.groupby([daily.index.year, daily.index.month]).first()
Open High Low Close Volume Adj Close
2000 1 14.200 14.50 14.15 14.40 277400 2.7916
2 13.900 14.39 13.90 14.25 287200 2.7625
2016 1 23.620 23.62 23.62 23.62 0 22.1893
2 19.575 19.63 19.14 19.45 1783000 18.2719
daily.index = daily.index.to_period('M')
print daily.groupby(daily.index).first()
Open High Low Close Volume Adj Close
Date
2000-01 14.200 14.50 14.15 14.40 277400 2.7916
2000-02 13.900 14.39 13.90 14.25 287200 2.7625
2016-01 23.620 23.62 23.62 23.62 0 22.1893
2016-02 19.575 19.63 19.14 19.45 1783000 18.2719
回答by Alexander
You can use a list comprehension to access the year and month accessor variable from your timestamps and then group on those.
您可以使用列表推导从时间戳访问年和月访问器变量,然后对它们进行分组。
>>> df.groupby([[d.year for d in df.Date], [d.month for d in df.Date]]).first()
Date Open High Low Close Volume Adj_Close
2000 1 2000-01-01 14.200 14.50 14.15 14.40 277400 2.7916
2 2000-02-01 13.900 14.39 13.90 14.25 287200 2.7625
2016 1 2016-01-01 23.620 23.62 23.62 23.62 0 22.1893
2 2016-02-01 19.575 19.63 19.14 19.45 1783000 18.2719