pandas 从数据透视表绘制熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36132749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Plotting from Pivot Table
提问by cir
I am basically trying to reproduce climate diagrams showing mean temperature and precipitation over the year for various locations.
我基本上是在尝试重现气候图,显示不同地点全年的平均温度和降水量。
I've generated a pivot table from my csv the following way:
我通过以下方式从我的 csv 生成了一个数据透视表:
data = pd.read_csv("05_temp_rain_v2.csv")
pivot = data.pivot_table(["rain(mm)","temp(dC)"], ["loc","month"])
sample data in text form:
文本形式的示例数据:
loc,lat,long,year,month,rain(mm),temp(dC)
Adria_-_Bellombra,45.011129,12.034126,1994,1,45.6,4.6
Adria_-_Bellombra,45.011129,12.034126,1994,2,31.4,4
Adria_-_Bellombra,45.011129,12.034126,1994,3,1.6,10.7
Adria_-_Bellombra,45.011129,12.034126,1994,4,74.4,11.5
Adria_-_Bellombra,45.011129,12.034126,1994,5,26,17.2
Adria_-_Bellombra,45.011129,12.034126,1994,6,108.6,20.6
Pivot Table:
数据透视表:
Since I am handling various locations, I am iterating over them:
由于我正在处理不同的位置,我正在迭代它们:
locations=pivot.index.get_level_values(0).unique()
for location in locations:
split=pivot.xs(location)
rain=split["rain(mm)"]
temp=split["temp(dC)"]
plt.subplots()
temp.plot(kind="line",color="r",).legend()
rain.plot(kind="bar").legend()
An example plot output is shown below:
示例图输出如下所示:
Why are my temperature values being plotted starting from February (2)?
I assume it is because the temperature values are listed in the second column.
为什么我的温度值是从 2 月 (2) 开始绘制的?
我认为这是因为温度值列在第二列中。
What would be the proper way to handle and plot different data (two columns) from a pivot table?
从数据透视表处理和绘制不同数据(两列)的正确方法是什么?
回答by jrjc
It's because line
and bar
plots do not set the xlim
the same way. The x-axis is interpreted as categorical data in case of the bar plot, whereas it is interpreted as continuous data for the line plot. The result being that xlim
and xticks
are not set identically in both situations.
这是因为line
和bar
情节没有xlim
以相同的方式设置。在条形图的情况下,x 轴被解释为分类数据,而对于线图,它被解释为连续数据。结果是xlim
和xticks
在这两种情况下的设置不同。
Consider this:
考虑一下:
In [4]: temp.plot(kind="line",color="r",)
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x117f555d0>
In [5]: plt.xticks()
Out[5]: (array([ 1., 2., 3., 4., 5., 6.]), <a list of 6 Text xticklabel objects>)
where the position of the ticks is an array of float ranging from 1 to 6.
其中刻度的位置是一个范围从1 到 6的浮点数组。
and
和
In [6]: rain.plot(kind="bar").legend()
Out[6]: <matplotlib.legend.Legend at 0x11c15e950>
In [7]: plt.xticks()
Out[7]: (array([0, 1, 2, 3, 4, 5]), <a list of 6 Text xticklabel objects>)
where the position of the ticks is an array of int ranging from 0 to 5.
其中刻度的位置是一个范围从0 到 5的 int 数组。
So, the easier is to replace this part:
所以,更容易的是更换这部分:
temp.plot(kind="line", color="r",).legend()
rain.plot(kind="bar").legend()
by:
经过:
rain.plot(kind="bar").legend()
plt.plot(range(len(temp)), temp, "r", label=temp.name)
plt.legend()
回答by cir
Thanks to jeanrjc's answerand this threadI think I'm finally quite satisfied!
感谢jeanrjc 的回答和这个帖子,我想我终于很满意了!
for location in locations:
#print(pivot.xs(location, level=0))
split=pivot.xs(location)
rain=split["rain(mm)"]
temp=split["temp(dC)"]
fig = plt.figure()
ax1 = rain.plot(kind="bar")
ax2 = ax1.twinx()
ax2.plot(ax1.get_xticks(),temp,linestyle='-',color="r")
ax2.set_ylim((-5, 50.))
#ax1.set_ylim((0, 300.))
ax1.set_ylabel('Precipitation (mm)', color='blue')
ax2.set_ylabel('Temperature (°C)', color='red')
ax1.set_xlabel('Months')
plt.title(location)
labels = ['Jan','Feb','Mar','Apr','May','Jun', 'Jul','Aug','Sep','Oct','Nov','Dez']
#plt.xticks(range(12),labels,rotation=45)
ax1.set_xticklabels(labels, rotation=45)
I am receiving the following output, which is very close to what I intend:
回答by IanS
You could loop over the results of a groupby
operation:
您可以遍历groupby
操作的结果:
for name, group in data[['loc', 'month', 'rain(mm)', 'temp(dC)']].groupby('loc'):
group.set_index('month').plot()