Python Pandas：按日历周分组，然后为实际日期时间绘制分组条形图

Question

提问by tim

EDIT

编辑

I found a quite nice solution and posted it below as an answer. The result will look like this:

我找到了一个很好的解决方案并将其发布在下面作为答案。结果将如下所示：

enter image description here

在此处输入图片说明

Some example data you can generate for this problem:

您可以为此问题生成的一些示例数据：

codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); 
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})

resulting in:

导致：

In [55]: df
Out[55]:
    amount col1 col2       date
0        1    D    E 2013-11-01
0        5    E    B 2013-11-01
1        5    G    A 2013-11-02
1        7    D    H 2013-11-02
2        5    E    G 2013-11-03
2        4    H    G 2013-11-03
3        7    A    F 2013-11-04
3        3    A    A 2013-11-04
4        1    E    G 2013-11-05
4        7    D    C 2013-11-05
5        5    C    A 2013-11-06
5        7    H    F 2013-11-06
6        1    G    B 2013-11-07
6        8    D    A 2013-11-07
7        1    B    H 2013-11-08
7        8    F    H 2013-11-08
8        3    A    E 2013-11-09
8        1    H    D 2013-11-09
9        3    B    D 2013-11-10
9        1    H    G 2013-11-10
10       6    E    E 2013-11-11
10       6    F    E 2013-11-11
11       2    G    B 2013-11-12
11       5    H    H 2013-11-12
12       5    F    G 2013-11-13
12       5    G    B 2013-11-13
13       8    H    B 2013-11-14
13       6    G    F 2013-11-14
14       9    F    C 2013-11-15
14       4    H    A 2013-11-15
..     ...  ...  ...        ...
77       9    A    B 2014-01-17
77       7    E    B 2014-01-17
78       4    F    E 2014-01-18
78       6    B    E 2014-01-18
79       6    A    H 2014-01-19
79       3    G    D 2014-01-19
80       7    E    E 2014-01-20
80       6    G    C 2014-01-20
81       9    H    G 2014-01-21
81       9    C    B 2014-01-21
82       2    D    D 2014-01-22
82       7    D    A 2014-01-22
83       6    G    B 2014-01-23
83       1    A    G 2014-01-23
84       9    B    D 2014-01-24
84       7    G    D 2014-01-24
85       7    A    F 2014-01-25
85       9    B    H 2014-01-25
86       9    C    D 2014-01-26
86       5    E    B 2014-01-26
87       3    C    H 2014-01-27
87       7    F    D 2014-01-27
88       3    D    G 2014-01-28
88       4    A    D 2014-01-28
89       2    F    A 2014-01-29
89       8    D    A 2014-01-29
90       1    A    G 2014-01-30
90       6    C    A 2014-01-30
91       6    H    C 2014-01-31
91       2    G    F 2014-01-31

[184 rows x 4 columns]

I'd like to group by calendar-week and by value of col1. Like this:

我想按日历周和col1. 像这样：

kw = lambda x: x.isocalendar()[1]
grouped = df.groupby([df['date'].map(kw), 'col1'], sort=False).agg({'amount': 'sum'})

resulting in:

导致：

In [58]: grouped
Out[58]:
           amount
date col1
44   D          8
     E         10
     G          5
     H          4
45   D         15
     E          1
     G          1
     H          9
     A         13
     C          5
     B          4
     F          8
46   E          7
     G         13
     H         17
     B          9
     F         23
47   G         14
     H          4
     A         40
     C          7
     B         16
     F         13
48   D          7
     E         16
     G          9
     H          2
     A          7
     C          7
     B          2
...           ...
1    H         14
     A         14
     B         15
     F         19
2    D         13
     H         13
     A         13
     B         10
     F         32
3    D          8
     E         18
     G          3
     H          6
     A         30
     C          9
     B          6
     F          5
4    D          9
     E         12
     G         19
     H          9
     A          8
     C         18
     B         18
5    D         11
     G          2
     H          6
     A          5
     C          9
     F          9

[87 rows x 1 columns]

Then I want a plot to be generated like this: enter image description here That means: calendar-week and year (datetime) on the x-axis and for each of the grouped col1one bar.

然后我想像这样生成一个图：在此处输入图片说明这意味着：x 轴上的日历周和年份（日期时间）以及每个分组的col1一个条。

The problem I'm facing is: I only have integers describing the calendar week (KW in the plot), but I somehow have to merge back the date on it to get the ticks labeled by year as well. Furthermore I can't only plot the grouped calendar week because I need a correct order of the items (kw 47, kw 48 (year 2013) have to be on the left side of kw 1 (because this is 2014)).

我面临的问题是：我只有描述日历周的整数（图中的 KW），但我必须以某种方式将日期合并回来才能获得按年份标记的刻度。此外，我不能只绘制分组日历周，因为我需要正确的项目顺序（kw 47、kw 48（2013 年）必须在 kw 1 的左侧（因为这是 2014））。

EDIT

编辑

I figured out from here: http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-barplotthat grouped bars need to be columns instead of rows. So I thought about how to transform the data and found the method pivotwhich turns out to be a great function. reset_indexis needed to transform the multiindex into columns. At the end I fill NaNs by zero:

我从这里发现：http: //pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-barplot分组条需要是列而不是行。所以我考虑了如何转换数据，并找到了pivot一个很好的函数。reset_index需要将多索引转换为列。最后我NaN用零填充s：

A = grouped.reset_index().pivot(index='date', columns='col1', values='amount').fillna(0)

transforms the data into:

将数据转换为：

col1   A   B   C   D   E   F   G   H
date
1      4  31   0   0   0  18  13   8
2      0  12  13  22   1  17   0   8
3      3  10   4  13  12   8   7   6
4     17   0  10   7   0  25   7   4
5      7   0   7   9   8   6   0   7
44     0   0   2  11   7   0   0   2
45     9   3   2  14   0  16  21   2
46     0  14   7   2  17  13  11   8
47     5  13   0  15  19   7   5  10
48    15   8  12   2  20   4   7   6
49    20   0   0  18  22  17  11   0
50     7  11   8   6   5   6  13  10
51     8  26   0   0   5   5  16   9
52     8  13   7   5   4  10   0  11

which looks like the example data in the docs to be plotted in grouped bars:

看起来像文档中的示例数据要绘制在分组条中：

A. plot(kind='bar')

gets this:

得到这个：

enter image description here

在此处输入图片说明

whereas I have the problem with the axis as it is now sorted (from 1-52), which is actually wrong, because calendar week 52 belongs to year 2013 in this case... Any ideas on how to merge back the real datetime for the calendar-weeks and use them as x-axis ticks?

而我有轴的问题，因为它现在排序（从 1-52），这实际上是错误的，因为在这种情况下，日历周 52 属于 2013 年......关于如何合并回真实日期时间的任何想法日历周并将它们用作 x 轴刻度？

Answer 1

采纳答案by tim

Okay I answer the question myself as I finally figured it out. The key is to not group by calendar week (as you would loose information about the year) but rather group by a string containing calendar week and year.

好吧，我自己回答了这个问题，因为我终于弄明白了。关键是不要按日历周分组（因为您会丢失有关年份的信息），而是按包含日历周和年份的字符串分组。

Then change the layout (reshaping) as mentioned in the question already by using pivot. The date will be the index. Use reset_index()to make the current date-index a column and instead get a integer-range as an index (which is then in the correct order to be plotted (lowest-year/calendar week is index 0 and highest year/calendar week is the highest integer).

然后使用pivot. 日期将是索引。使用reset_index()使当前date-index一列，而是得到一个整数范围内的索引（然后以正确的顺序被绘制（最低年/日历星期是指数0和最高年/日历周是最高的整数）。

Select the date-column as a new variable ticksas a list and delete that column from the DataFrame. Now plot the bars and simply set the labels of the xticks to ticks. Completey solution is quite easy and here:

选择date-column 作为ticks列表的新变量，然后从 DataFrame 中删除该列。现在绘制条形图并简单地将 xticks 的标签设置为ticks。Completey 解决方案非常简单，在这里：

codes = list('ABCDEFGH'); 
dates = pd.Series(pd.date_range('2013-11-01', '2014-01-31')); 
dates = dates.append(dates)
dates.sort()
df = pd.DataFrame({'amount': np.random.randint(1, 10, dates.size), 'col1': np.random.choice(codes, dates.size), 'col2': np.random.choice(codes, dates.size), 'date': dates})

kw = lambda x: x.isocalendar()[1]; 
kw_year = lambda x: str(x.year) + ' - ' + str(x.isocalendar()[1])
grouped = df.groupby([df['date'].map(kw_year), 'col1'], sort=False, as_index=False).agg({'amount': 'sum'})
A = grouped.pivot(index='date', columns='col1', values='amount').fillna(0).reset_index()

ticks = A.date.values.tolist()
del A['date']
ax = A.plot(kind='bar')
ax.set_xticklabels(ticks)

RESULT:

结果：

enter image description here

在此处输入图片说明

Answer 2

回答by Veedrac

Add the week to 52 times the year, so that weeks are ordered "by year". Set the tick labelsback, which might be nontrivial, to what you want.

将周添加到一年的 52 次，以便“按年”对周进行排序。将刻度标签设置回您想要的，这可能很重要。

What you want is for the weeks to increase like so

你想要的是像这样增加几周

nth week → (n+1)th week → (n+2)th week → etc.

but when you have a new year it instead fallsby 51 (52 → 1).

但是当你有新的一年时，它反而下降了 51 ( 52 → 1)。

To offset this, note that the year increases by one. So add the year's increase multiplied by 52 and the total change will be -51 + 52 = 1as wanted.

为了抵消这一点，请注意年份增加一。因此，加上年份的增长乘以 52，总的变化就会-51 + 52 = 1如愿以偿。

Answer 3

回答by patricksurry

I think resample('W')is a better way to do this - by default it groups by weeks ending on Sunday ('W' is the same as 'W-SUN') but you can specify whatever you want.

我认为resample('W')是一种更好的方法 - 默认情况下，它按星期日结束的周进行分组（'W' 与 'W-SUN' 相同），但您可以指定任何您想要的。

In your example, try this:

在你的例子中，试试这个：

grouped = (df
    .groupby('col1')                
    .apply(lambda g:               # work on groups of col1
        g.set_index('date')        
        [['amount']]
        .resample('W', how='sum')  # sum the amount field across weeks
    )
    .unstack(level=0)              # pivot the col1 index rows to columns
    .fillna(0)
)
grouped.columns=grouped.columns.droplevel()   # drop the 'col1' part of the multi-index column names
print grouped
grouped.plot(kind='bar')

which should print your data table and make a plot similar to yours, but with "real" date labels:

它应该打印您的数据表并绘制与您相似的图，但带有“真实”日期标签：

col1         A   B   C   D   E   F   G   H
date                                      
2013-11-03  18  0   9   0   8   0   0   4 
2013-11-10  4   11  0   1   16  2   15  2 
2013-11-17  10  14  19  8   13  6   9   8 
2013-11-24  10  13  13  0   0   13  15  10
2013-12-01  6   3   19  8   8   17  8   12
2013-12-08  5   15  5   7   12  0   11  8 
2013-12-15  8   6   11  11  0   16  6   14
2013-12-22  16  3   13  8   8   11  15  0 
2013-12-29  1   3   6   10  7   7   17  15
2014-01-05  12  7   10  11  6   0   1   12
2014-01-12  13  0   17  0   23  0   10  12
2014-01-19  10  9   2   3   8   1   18  3 
2014-01-26  24  9   8   1   19  10  0   3 
2014-02-02  1   6   16  0   0   10  8   13

Python Pandas：按日历周分组，然后为实际日期时间绘制分组条形图

提问by tim

EDIT

编辑

EDIT

编辑

采纳答案by tim

回答by Veedrac

回答by patricksurry

相关推荐

最近更新

标签

Python Pandas：按日历周分组，然后为实际日期时间绘制分组条形图

提问by tim

EDIT

编辑

EDIT

编辑

采纳答案by tim

回答by Veedrac

回答by patricksurry

相关推荐

如何使用 xlrd 在 Python 中获取 Excel 工作表名称

Python 麻烦传入lambda来申请pandas DataFrame

Python Argparse：“可选参数”下列出的必需参数？

Python Flask 发布并返回 json 对象

相关推荐

最近更新

标签