pandas 来自熊猫数据框中列的热图

Question

提问by gustavgans

I try to generate a heatmap from a pandas dataframe by days and hours of the day (X-> days, Y->hours). The result should be something like this:

我尝试按一天中的几天和几小时（X-> 天，Y-> 小时）从 Pandas 数据帧生成热图。结果应该是这样的：

the data source is a table in postgres:

数据源是 postgres 中的一个表：

   id    |       created_at       
---------+------------------------
 2558145 | 2017-03-02 11:31:15+01
 2558146 | 2017-03-02 11:31:46+01
 2558147 | 2017-03-02 11:32:28+01
 2558148 | 2017-03-02 11:32:57+01
....

here is my code the regroup the data by hour.

这是我的代码，按小时重新组合数据。

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:postgres@localhost:5432/bla')
import datetime
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from matplotlib.dates import date2num
import seaborn as sns

df = pd.read_sql_query("""
SELECT created_at, 1 as print
FROM foo
WHERE created_at > '2017-02-01'
AND created_at < '2017-03-01'""", con=engine)

df['created_at'] = pd.to_datetime(df['created_at'])
df.index = df['created_at']

df = df.resample('H')['print'].sum()
df.fillna(0, inplace=True)

print(df.head())

created_at
2017-02-01 07:00:00+00:00      1.0
2017-02-01 08:00:00+00:00    152.0
2017-02-01 09:00:00+00:00    101.0
2017-02-01 10:00:00+00:00     92.0
2017-02-01 11:00:00+00:00    184.0
Freq: H, Name: print, dtype: float64

The result looks fine but I can not figure out how to plot this dataframe?

结果看起来不错，但我不知道如何绘制这个数据框？

Answer 1

回答by ImportanceOfBeingErnest

A heatmap is a two dimensional plot, which maps x and y pairs to a value. This means that the input to the heatmap must be a 2D array.

热图是二维图，它将 x 和 y 对映射到一个值。这意味着热图的输入必须是二维数组。

Here you would want to have the columns of the array denote days and the rows to denote the hours. As a first step we would need to have days and hours in two different columns of the dataframe. One could then reshape those columns to 2D arrays, which would require to know how many days and hours there are. If would also require that there is actually an entry for each day/hour pair.
Without this restriction we can alternatively use a pivot_tableto aggregate the values in a table. This is shown in the following solution.

在这里，您希望数组的列表示天，行表示小时。作为第一步，我们需要在数据框的两个不同列中设置天数和小时数。然后可以将这些列重塑为二维数组，这需要知道有多少天和小时。如果还要求实际上每个天/小时对都有一个条目。
如果没有这个限制，我们也可以使用 apivot_table来聚合表中的值。这显示在以下解决方案中。

import pandas as pd
import numpy as np; np.random.seed(0)
import seaborn.apionly as sns
import matplotlib.pyplot as plt

# create dataframe with datetime as index and aggregated (frequency) values
date = pd.date_range('2017-02-23', periods=10*12, freq='2h')
freq = np.random.poisson(lam=2, size=(len(date)))
df = pd.DataFrame({"freq":freq}, index=date)

# add a column hours and days
df["hours"] = df.index.hour
df["days"] = df.index.map(lambda x: x.strftime('%b-%d'))     
# create pivot table, days will be columns, hours will be rows
piv = pd.pivot_table(df, values="freq",index=["hours"], columns=["days"], fill_value=0)
#plot pivot table as heatmap using seaborn
ax = sns.heatmap(piv, square=True)
plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
plt.tight_layout()
plt.show()

For plotting you may also use a matplotlib imshow plot as follows:

对于绘图，您还可以使用 matplotlib imshow 绘图，如下所示：

fig, ax = plt.subplots()
im = ax.imshow(piv, cmap="Greens")
fig.colorbar(im, ax=ax)

ax.set_xticks(range(len(piv.columns)))
ax.set_yticks(range(len(piv.index)))
ax.set_xticklabels(piv.columns, rotation=90)
ax.set_yticklabels(piv.index)
ax.set_xlabel("Days")
ax.set_ylabel("Hours")

plt.tight_layout()
plt.show()

pandas 来自熊猫数据框中列的热图

提问by gustavgans

回答by ImportanceOfBeingErnest

相关推荐

最近更新

标签

pandas 来自熊猫数据框中列的热图

提问by gustavgans

回答by ImportanceOfBeingErnest

相关推荐

pandas ProgrammingError: (psycopg2.ProgrammingError) 无法适应类型“dict”

pandas 如何使用python计算一列数据相对于另一列的百分位排名

pandas Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 44 中的字节 0x81：字符映射到 <undefined>

pandas 如何从熊猫中的两列创建一个数组

相关推荐

最近更新

标签