pandas 使用seaborn在python中绘制3列的热图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44480226/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting heatmap for 3 columns in python with seaborn
提问by user308827
v1 v2 yy
15.25 44.34 100.00
83.05 59.78 100.00
96.61 65.09 100.00
100.00 75.47 100.00
100.00 50.00 100.00
100.00 68.87 100.00
100.00 79.35 100.00
100.00 100.00 100.00
100.00 63.21 100.00
100.00 100.00 100.00
100.00 68.87 100.00
0.00 56.52 92.86
10.17 52.83 92.86
23.73 46.23 92.86
In the dataframe above, I want to plot a heatmap using v1 and v2 as x and y axis and yy as the value. How can I do that in python? I tried seaborn:
在上面的数据框中,我想使用 v1 和 v2 作为 x 和 y 轴以及 yy 作为值绘制热图。我怎么能在python中做到这一点?我试过seaborn:
df = df.pivot('v1', 'v2', 'yy')
ax = sns.heatmap(df)
However, this does not work. Any other solution?
但是,这不起作用。还有其他解决方案吗?
采纳答案by ImportanceOfBeingErnest
A seaborn heatmap
plots categorical data. This means that each occuring value would take the same space in the heatmap as any other value, independent on how far they are separated numerically. This is usually undesired for numerical data. Instead one of the following techniques may be chosen.
Seabornheatmap
绘制分类数据。这意味着每个出现的值将在热图中占据与任何其他值相同的空间,独立于它们在数字上的分隔距离。这对于数值数据通常是不希望的。作为替代,可以选择以下技术之一。
Scatter
Scatter
A colored scatter plot may be just as good as a heatmap. The colors of the points would represent the yy
value.
彩色散点图可能与热图一样好。点的颜色将代表该yy
值。
ax.scatter(df.v1, df.v2, c=df.yy, cmap="copper")
u = u"""v1 v2 yy
15.25 44.34 100.00
83.05 59.78 100.00
96.61 65.09 100.00
100.00 75.47 100.00
100.00 50.00 100.00
100.00 68.87 100.00
100.00 79.35 100.00
100.00 100.00 100.00
100.00 63.21 100.00
100.00 100.00 100.00
100.00 68.87 100.00
0.00 56.52 92.86
10.17 52.83 92.86
23.73 46.23 92.86"""
import pandas as pd
import matplotlib.pyplot as plt
import io
df = pd.read_csv(io.StringIO(u), delim_whitespace=True )
fig, ax = plt.subplots()
sc = ax.scatter(df.v1, df.v2, c=df.yy, cmap="copper")
fig.colorbar(sc, ax=ax)
ax.set_aspect("equal")
plt.show()
Hexbin
Hexbin
You may want to look into hexbin
. The data would be shown in hexagonal bins and the data is aggregated as the mean inside each bin. The advantage here is that if you choose the gridsize large, it will look like a scatter plot, while if you make it small, it looks like a heatmap, allowing to adjust the plot easily to the desired resolution.
您可能需要查看hexbin
. 数据将显示在六边形箱中,数据聚合为每个箱内的平均值。这里的优点是,如果你选择大的 gridsize,它看起来像一个散点图,而如果你选择它小,它看起来像一个热图,可以轻松地将图调整到所需的分辨率。
h1 = ax.hexbin(df.v1, df.v2, C=df.yy, gridsize=100, cmap="copper")
h2 = ax2.hexbin(df.v1, df.v2, C=df.yy, gridsize=10, cmap="copper")
u = u"""v1 v2 yy
15.25 44.34 100.00
83.05 59.78 100.00
96.61 65.09 100.00
100.00 75.47 100.00
100.00 50.00 100.00
100.00 68.87 100.00
100.00 79.35 100.00
100.00 100.00 100.00
100.00 63.21 100.00
100.00 100.00 100.00
100.00 68.87 100.00
0.00 56.52 92.86
10.17 52.83 92.86
23.73 46.23 92.86"""
import pandas as pd
import matplotlib.pyplot as plt
import io
df = pd.read_csv(io.StringIO(u), delim_whitespace=True )
fig, (ax, ax2) = plt.subplots(nrows=2)
h1 = ax.hexbin(df.v1, df.v2, C=df.yy, gridsize=100, cmap="copper")
h2 = ax2.hexbin(df.v1, df.v2, C=df.yy, gridsize=10, cmap="copper")
fig.colorbar(h1, ax=ax)
fig.colorbar(h2, ax=ax2)
ax.set_aspect("equal")
ax2.set_aspect("equal")
ax.set_title("gridsize=100")
ax2.set_title("gridsize=10")
fig.subplots_adjust(hspace=0.3)
plt.show()
Tripcolor
Tripcolor
A tripcolor
plot can be used to obtain colored reagions in the plot according to the datapoints, which are then interpreted as the edges of triangles, colorized according the edgepoints' data. Such a plot would require to have more data available to give a meaningful representation.
甲tripcolor
情节可用于根据数据点,然后将其解释为三角形的边中的情节,得到着色reagions,根据edgepoints'数据着色。这样的图需要有更多的可用数据才能给出有意义的表示。
ax.tripcolor(df.v1, df.v2, df.yy, cmap="copper")
u = u"""v1 v2 yy
15.25 44.34 100.00
83.05 59.78 100.00
96.61 65.09 100.00
100.00 75.47 100.00
100.00 50.00 100.00
100.00 68.87 100.00
100.00 79.35 100.00
100.00 100.00 100.00
100.00 63.21 100.00
100.00 100.00 100.00
100.00 68.87 100.00
0.00 56.52 92.86
10.17 52.83 92.86
23.73 46.23 92.86"""
import pandas as pd
import matplotlib.pyplot as plt
import io
df = pd.read_csv(io.StringIO(u), delim_whitespace=True )
fig, ax = plt.subplots()
tc = ax.tripcolor(df.v1, df.v2, df.yy, cmap="copper")
fig.colorbar(tc, ax=ax)
ax.set_aspect("equal")
ax.set_title("tripcolor")
plt.show()
Note that atricontourf
plot may equally be suited, if more datapoints throughout the grid are available.
请注意tricontourf
,如果整个网格中有更多数据点可用,则绘图可能同样适用。
ax.tricontourf(df.v1, df.v2, df.yy, cmap="copper")
回答by Serenity
The problem that your data has duplicate values like:
您的数据具有重复值的问题,例如:
100.00 100.00 100.00
100.00 100.00 100.00
You have to drop duplicate values then pivot and plot like here:
您必须删除重复的值,然后像这样进行透视和绘图:
import seaborn as sns
import pandas as pd
# fill data
df = pd.read_clipboard()
df.drop_duplicates(['v1','v2'], inplace=True)
pivot = df.pivot(index='v1', columns='v2', values='yy')
ax = sns.heatmap(pivot,annot=True)
plt.show()
print (pivot)
Pivot:
枢:
v2 44.34 46.23 50.00 52.83 56.52 59.78 63.21 65.09 \
v1
0.00 NaN NaN NaN NaN 92.86 NaN NaN NaN
10.17 NaN NaN NaN 92.86 NaN NaN NaN NaN
15.25 100.0 NaN NaN NaN NaN NaN NaN NaN
23.73 NaN 92.86 NaN NaN NaN NaN NaN NaN
83.05 NaN NaN NaN NaN NaN 100.0 NaN NaN
96.61 NaN NaN NaN NaN NaN NaN NaN 100.0
100.00 NaN NaN 100.0 NaN NaN NaN 100.0 NaN
v2 68.87 75.47 79.35 100.00
v1
0.00 NaN NaN NaN NaN
10.17 NaN NaN NaN NaN
15.25 NaN NaN NaN NaN
23.73 NaN NaN NaN NaN
83.05 NaN NaN NaN NaN
96.61 NaN NaN NaN NaN
100.00 100.0 100.0 100.0 100.0