pandas 熊猫按日期索引求和和分组

Question

提问by eh2deni

I have a list of records which I need to calculate sums per reference and group by month.

我有一个记录列表，我需要计算每个参考和按月分组的总和。

Input

输入

Date1;Date2;Company;Reference;Description;Value;Currency;
24.01.2014;23.01.2014;Company1;Debit;Text;-100,00;EUR;
24.01.2014;24.01.2014;Company2;Debit;Text;-130,00;EUR;
21.01.2014;24.01.2014;Company1;Debit;Text;-50,00;EUR;
21.01.2014;21.01.2014;Company3;Credit;Text;600,00;EUR;
17.02.2014;16.01.2014;Company2;Debit;Text;-110,00;EUR;
16.02.2014;16.01.2014;Company1;Credit;Text;40,00;EUR
02.03.2014;02.03.2014;Company6;Debit;Text;90,00;EUR

Desired Output

期望输出

         Credit Debit
Date1                                                                  
2014-01; 600;   -180   
2014-02; 40;    -110   
2014-03; NaN;   -90

Here is my code

这是我的代码

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pylab
DATA = pd.read_csv('C:\Users\Desktop\File.txt'
CLEAN_VALUE=DATA['Value'].str.replace('.','').str.replace(',','.').astype(float)
DATA['CLEAN_VALUE']=CLEAN_VALUE
SUM=DATA['CLEAN_VALUE'].groupby(DATA['Reference']).sum()
DATA['Date1']=pd.to_datetime(DATA['Date1'],dayfirst=True,unit='D')
summe=DATA[:10].groupby(['Date1','Reference']).sum().unstack()
Sorted=pd.date_range(2013-01-01,2014-03-01,freq="M").resample('H', how=len)

It is actually working fine until the last step when I try to group by month. I receive this error:

在我尝试按月分组的最后一步之前，它实际上工作正常。我收到此错误：

Error message TypeError: Only valid with DatetimeIndex or PeriodIndex. Any suggestions? Thanks in advance.

Error message TypeError: Only valid with DatetimeIndex or PeriodIndex. Any suggestions? Thanks in advance.

Answer 1

回答by Jeff

This requires 0.13.1 (for the infer_datetime_format). This is much faster than using a dateparser directly. This will convert the datelike to the proper datetime64[ns]dtypes.

这需要 0.13.1（对于infer_datetime_format）。这比直接使用 dateparser 快得多。这会将 datelike 转换为正确的datetime64[ns]dtypes。

In [75]: df = read_csv(StringIO(data),
               sep=';',
               decimal=',',
               infer_datetime_format=True,
               parse_dates=['Date1','Date2']
              ).dropna(how='all',axis=1)

In [76]: df
Out[76]: 
       Date1      Date2   Company Reference Description  Value Currency
0 2014-01-24 2014-01-23  Company1     Debit        Text   -100      EUR
1 2014-01-24 2014-01-24  Company2     Debit        Text   -130      EUR
2 2014-01-21 2014-01-24  Company1     Debit        Text    -50      EUR
3 2014-01-21 2014-01-21  Company3    Credit        Text    600      EUR
4 2014-02-17 2014-01-16  Company2     Debit        Text   -110      EUR
5 2014-02-16 2014-01-16  Company1    Credit        Text     40      EUR
6 2014-03-02 2014-03-02  Company6     Debit        Text     90      EUR

[7 rows x 7 columns]

This requires 0.14/master, and will be the new easy way to do a multiple grouping with a time based and other grouper.

这需要 0.14/master，并且将是使用基于时间和其他石斑鱼进行多重分组的新方法。

In [77]: df.groupby([pd.Grouper(freq='MS',key='Date1'),'Reference']).sum().unstack()
Out[77]: 
             Value       
Reference   Credit  Debit
Date1                    
2014-01-01     600   -280
2014-02-01      40   -110
2014-03-01     NaN     90

[3 rows x 2 columns]

< 0.14/master you can do this:

< 0.14/master 你可以这样做：

In [79]: df.set_index('Date1').groupby(lambda x: (x.year,x.month)).apply(lambda x: x.groupby('Reference').sum()).unstack()
Out[79]: 
            Value       
Reference  Credit  Debit
(2014, 1)     600   -280
(2014, 2)      40   -110
(2014, 3)     NaN     90

[3 rows x 2 columns]

Answer 2

回答by behzad.nouri

>>> import pandas as pd
>>> import datetime as dt
>>> from io import StringIO
>>> df = pd.read_table(StringIO('''\
... Date1;Date2;Company;Reference;Description;Value;Currency;
... 24.01.2014;23.01.2014;Company1;Debit;Text;-100,00;EUR;
... 24.01.2014;24.01.2014;Company2;Debit;Text;-130,00;EUR;
... 21.01.2014;24.01.2014;Company1;Debit;Text;-50,00;EUR;
... 21.01.2014;21.01.2014;Company3;Credit;Text;600,00;EUR;
... 17.02.2014;16.01.2014;Company2;Debit;Text;-110,00;EUR;
... 16.02.2014;16.01.2014;Company1;Credit;Text;40,00;EUR
... 02.03.2014;02.03.2014;Company6;Debit;Text;90,00;EUR'''),
... sep=';', decimal=',', parse_dates=['Date1', 'Date2'],
... date_parser=lambda t: dt.datetime.strptime(t, '%d.%m.%Y').date())
>>>
>>> df.dropna(axis=1, how='all', inplace=True)
>>> df
        Date1       Date2   Company Reference Description  Value Currency
0  2014-01-24  2014-01-23  Company1     Debit        Text   -100      EUR
1  2014-01-24  2014-01-24  Company2     Debit        Text   -130      EUR
2  2014-01-21  2014-01-24  Company1     Debit        Text    -50      EUR
3  2014-01-21  2014-01-21  Company3    Credit        Text    600      EUR
4  2014-02-17  2014-01-16  Company2     Debit        Text   -110      EUR
5  2014-02-16  2014-01-16  Company1    Credit        Text     40      EUR
6  2014-03-02  2014-03-02  Company6     Debit        Text     90      EUR

[7 rows x 7 columns]

so far was just reading the data and parsing the values, the actual groupbystep is very short:

到目前为止只是读取数据并解析值，实际groupby步骤很短：

>>> df['month'] = df.Date1.apply(dt.date.strftime, args=('%Y.%m',))
>>> df.groupby(['month', 'Reference'])['Value'].aggregate(sum).unstack()
Reference  Credit  Debit
month                   
2014.01       600   -280
2014.02        40   -110
2014.03       NaN     90

[3 rows x 2 columns]

pandas 熊猫按日期索引求和和分组

提问by eh2deni

回答by Jeff

回答by behzad.nouri

相关推荐

最近更新

标签

pandas 熊猫按日期索引求和和分组

提问by eh2deni

回答by Jeff

回答by behzad.nouri

相关推荐

python pandas plot具有不均匀的时间序列索引（计数均匀分布）

pandas Scipy hstack 导致“类型错误：不支持类型转换：(dtype('float64'), dtype('O'))”

pandas 如何在pandas groupby中聚合多列

将文本添加到 Pandas 数据框图

相关推荐

最近更新

标签