pandas 来自熊猫数据框中列的热图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43330205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Heatmap from columns in pandas dataframe
提问by gustavgans
I try to generate a heatmap from a pandas dataframe by days and hours of the day (X-> days, Y->hours). The result should be something like this:
我尝试按一天中的几天和几小时(X-> 天,Y-> 小时)从 Pandas 数据帧生成热图。结果应该是这样的:
the data source is a table in postgres:
数据源是 postgres 中的一个表:
id | created_at
---------+------------------------
2558145 | 2017-03-02 11:31:15+01
2558146 | 2017-03-02 11:31:46+01
2558147 | 2017-03-02 11:32:28+01
2558148 | 2017-03-02 11:32:57+01
....
here is my code the regroup the data by hour.
这是我的代码,按小时重新组合数据。
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:postgres@localhost:5432/bla')
import datetime
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from matplotlib.dates import date2num
import seaborn as sns
df = pd.read_sql_query("""
SELECT created_at, 1 as print
FROM foo
WHERE created_at > '2017-02-01'
AND created_at < '2017-03-01'""", con=engine)
df['created_at'] = pd.to_datetime(df['created_at'])
df.index = df['created_at']
df = df.resample('H')['print'].sum()
df.fillna(0, inplace=True)
print(df.head())
created_at
2017-02-01 07:00:00+00:00 1.0
2017-02-01 08:00:00+00:00 152.0
2017-02-01 09:00:00+00:00 101.0
2017-02-01 10:00:00+00:00 92.0
2017-02-01 11:00:00+00:00 184.0
Freq: H, Name: print, dtype: float64
The result looks fine but I can not figure out how to plot this dataframe?
结果看起来不错,但我不知道如何绘制这个数据框?
回答by ImportanceOfBeingErnest
A heatmap is a two dimensional plot, which maps x and y pairs to a value. This means that the input to the heatmap must be a 2D array.
热图是二维图,它将 x 和 y 对映射到一个值。这意味着热图的输入必须是二维数组。
Here you would want to have the columns of the array denote days and the rows to denote the hours. As a first step we would need to have days and hours in two different columns of the dataframe. One could then reshape those columns to 2D arrays, which would require to know how many days and hours there are. If would also require that there is actually an entry for each day/hour pair.
Without this restriction we can alternatively use a pivot_table
to aggregate the values in a table. This is shown in the following solution.
在这里,您希望数组的列表示天,行表示小时。作为第一步,我们需要在数据框的两个不同列中设置天数和小时数。然后可以将这些列重塑为二维数组,这需要知道有多少天和小时。如果还要求实际上每个天/小时对都有一个条目。
如果没有这个限制,我们也可以使用 apivot_table
来聚合表中的值。这显示在以下解决方案中。
import pandas as pd
import numpy as np; np.random.seed(0)
import seaborn.apionly as sns
import matplotlib.pyplot as plt
# create dataframe with datetime as index and aggregated (frequency) values
date = pd.date_range('2017-02-23', periods=10*12, freq='2h')
freq = np.random.poisson(lam=2, size=(len(date)))
df = pd.DataFrame({"freq":freq}, index=date)
# add a column hours and days
df["hours"] = df.index.hour
df["days"] = df.index.map(lambda x: x.strftime('%b-%d'))
# create pivot table, days will be columns, hours will be rows
piv = pd.pivot_table(df, values="freq",index=["hours"], columns=["days"], fill_value=0)
#plot pivot table as heatmap using seaborn
ax = sns.heatmap(piv, square=True)
plt.setp( ax.xaxis.get_majorticklabels(), rotation=90 )
plt.tight_layout()
plt.show()
For plotting you may also use a matplotlib imshow plot as follows:
对于绘图,您还可以使用 matplotlib imshow 绘图,如下所示:
fig, ax = plt.subplots()
im = ax.imshow(piv, cmap="Greens")
fig.colorbar(im, ax=ax)
ax.set_xticks(range(len(piv.columns)))
ax.set_yticks(range(len(piv.index)))
ax.set_xticklabels(piv.columns, rotation=90)
ax.set_yticklabels(piv.index)
ax.set_xlabel("Days")
ax.set_ylabel("Hours")
plt.tight_layout()
plt.show()