pandas 大熊猫数据框按十年分组年份索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17764619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:00:37  来源:igfitidea点击:

pandas dataframe group year index by decade

pythonpandas

提问by wiswit

suppose I have a dataframe with index as monthy timestep, I know I can use dataframe.groupby(lambda x:x.year)to group monthly data into yearly and apply other operations. Is there some way I could quick group them, let's say by decade?

假设我有一个索引作为dataframe.groupby(lambda x:x.year)每月时间步长的数据框,我知道我可以使用将每月数据分组为每年并应用其他操作。有什么方法可以让我快速将它们分组,让我们说十年?

thanks for any hints.

感谢您的任何提示。

回答by DSM

To get the decade, you can integer-divide the year by 10 and then multiply by 10. For example, if you're starting from

要获得十年,您可以将年份除以 10,然后乘以 10。例如,如果您从

>>> dates = pd.date_range('1/1/2001', periods=500, freq="M")
>>> df = pd.DataFrame({"A": 5*np.arange(len(dates))+2}, index=dates)
>>> df.head()
             A
2001-01-31   2
2001-02-28   7
2001-03-31  12
2001-04-30  17
2001-05-31  22

You can group by year, as usual (here we have a DatetimeIndexso it's really easy):

您可以像往常一样按年份分组(这里我们有一个,DatetimeIndex所以这真的很容易):

>>> df.groupby(df.index.year).sum().head()
         A
2001   354
2002  1074
2003  1794
2004  2514
2005  3234

or you could do the (x//10)*10trick:

或者你可以这样做(x//10)*10

>>> df.groupby((df.index.year//10)*10).sum()
           A
2000   29106
2010  100740
2020  172740
2030  244740
2040   77424

If you don't have something on which you can use .year, you could still do lambda x: (x.year//10)*10).

如果您没有可以使用的东西.year,您仍然可以使用lambda x: (x.year//10)*10)

回答by waitingkuo

Use the year attribute of index:

使用索引的年份属性:

df.groupby(df.index.year)

回答by Shankar ARUL - jupyterdata.com

lets say your date column goes by the name Date, then you can group up

假设您的日期列按名称排列Date,然后您可以分组

dataframe.set_index('Date').ix[:,0].resample('10AS', how='count')

dataframe.set_index('Date').ix[:,0].resample('10AS', how='count')

Note: the ix- here chooses the first column in your dataframe

注意:ix- 此处选择数据框中的第一列

You get the various offsets: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

你得到各种偏移量:http: //pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

回答by Shiva_Achari

if your Data Frame has Headers say : DataFrame ['Population','Salary','vehicle count']

如果您的数据框有标题说: DataFrame ['Population','Salary','vehicle count']

Make your index as Year: DataFrame=DataFrame.set_index('Year')

将您的索引设为年份: DataFrame=DataFrame.set_index('Year')

use below code to resample data in decade of 10 years and also gives you some of all other columns within that dacade

使用下面的代码在 10 年的十年内重新采样数据,并为您提供该 dacade 中的所有其他列

datafame=dataframe.resample('10AS').sum()

datafame=dataframe.resample('10AS').sum()