pandas 从数据透视表绘制熊猫

Question

提问by cir

I am basically trying to reproduce climate diagrams showing mean temperature and precipitation over the year for various locations.

我基本上是在尝试重现气候图，显示不同地点全年的平均温度和降水量。

I've generated a pivot table from my csv the following way:

我通过以下方式从我的 csv 生成了一个数据透视表：

data = pd.read_csv("05_temp_rain_v2.csv")
pivot = data.pivot_table(["rain(mm)","temp(dC)"], ["loc","month"])

sample data in text form:

文本形式的示例数据：

loc,lat,long,year,month,rain(mm),temp(dC)
Adria_-_Bellombra,45.011129,12.034126,1994,1,45.6,4.6  
Adria_-_Bellombra,45.011129,12.034126,1994,2,31.4,4  
Adria_-_Bellombra,45.011129,12.034126,1994,3,1.6,10.7  
Adria_-_Bellombra,45.011129,12.034126,1994,4,74.4,11.5  
Adria_-_Bellombra,45.011129,12.034126,1994,5,26,17.2  
Adria_-_Bellombra,45.011129,12.034126,1994,6,108.6,20.6

Pivot Table:

数据透视表：

Since I am handling various locations, I am iterating over them:

由于我正在处理不同的位置，我正在迭代它们：

locations=pivot.index.get_level_values(0).unique()

for location in locations:
    split=pivot.xs(location)

    rain=split["rain(mm)"]
    temp=split["temp(dC)"]

    plt.subplots()
    temp.plot(kind="line",color="r",).legend()
    rain.plot(kind="bar").legend()

An example plot output is shown below:

示例图输出如下所示：

Why are my temperature values being plotted starting from February (2)?
I assume it is because the temperature values are listed in the second column.

为什么我的温度值是从 2 月 (2) 开始绘制的？
我认为这是因为温度值列在第二列中。

What would be the proper way to handle and plot different data (two columns) from a pivot table?

从数据透视表处理和绘制不同数据（两列）的正确方法是什么？

Answer 1

回答by jrjc

It's because lineand barplots do not set the xlimthe same way. The x-axis is interpreted as categorical data in case of the bar plot, whereas it is interpreted as continuous data for the line plot. The result being that xlimand xticksare not set identically in both situations.

这是因为line和bar情节没有xlim以相同的方式设置。在条形图的情况下，x 轴被解释为分类数据，而对于线图，它被解释为连续数据。结果是xlim和xticks在这两种情况下的设置不同。

Consider this:

考虑一下：

In [4]: temp.plot(kind="line",color="r",)
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x117f555d0>
In [5]: plt.xticks()
Out[5]: (array([ 1.,  2.,  3.,  4.,  5.,  6.]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of float ranging from 1 to 6.

其中刻度的位置是一个范围从1 到 6的浮点数组。

and

和

In [6]: rain.plot(kind="bar").legend()
Out[6]: <matplotlib.legend.Legend at 0x11c15e950>
In [7]: plt.xticks()
Out[7]: (array([0, 1, 2, 3, 4, 5]), <a list of 6 Text xticklabel objects>)

where the position of the ticks is an array of int ranging from 0 to 5.

其中刻度的位置是一个范围从0 到 5的 int 数组。

So, the easier is to replace this part:

所以，更容易的是更换这部分：

temp.plot(kind="line", color="r",).legend()
rain.plot(kind="bar").legend()

by:

经过：

rain.plot(kind="bar").legend()
plt.plot(range(len(temp)), temp, "r", label=temp.name)
plt.legend()

Answer 2

回答by cir

Thanks to jeanrjc's answerand this threadI think I'm finally quite satisfied!

感谢jeanrjc 的回答和这个帖子，我想我终于很满意了！

for location in locations:
#print(pivot.xs(location, level=0))

split=pivot.xs(location)
rain=split["rain(mm)"]
temp=split["temp(dC)"]

fig = plt.figure()
ax1 = rain.plot(kind="bar")
ax2 = ax1.twinx()
ax2.plot(ax1.get_xticks(),temp,linestyle='-',color="r")
ax2.set_ylim((-5, 50.))
#ax1.set_ylim((0, 300.))
ax1.set_ylabel('Precipitation (mm)', color='blue')
ax2.set_ylabel('Temperature (°C)', color='red')
ax1.set_xlabel('Months')
plt.title(location)
labels = ['Jan','Feb','Mar','Apr','May','Jun', 'Jul','Aug','Sep','Oct','Nov','Dez']
#plt.xticks(range(12),labels,rotation=45)
ax1.set_xticklabels(labels, rotation=45)

I am receiving the following output, which is very close to what I intend:

我收到以下输出，非常接近我的意图：

Answer 3

回答by IanS

You could loop over the results of a groupbyoperation:

您可以遍历groupby操作的结果：

for name, group in data[['loc', 'month', 'rain(mm)', 'temp(dC)']].groupby('loc'):
    group.set_index('month').plot()

pandas 从数据透视表绘制熊猫

提问by cir

回答by jrjc

回答by cir

回答by IanS

相关推荐

最近更新

标签

pandas 从数据透视表绘制熊猫

提问by cir

回答by jrjc

回答by cir

回答by IanS

相关推荐

Python 中 DataFrames 的 DataFrame (Pandas)

使用 index_col 时 Pandas read_sql 列不起作用 - 而是返回所有列

从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法

Pandas：使用波浪号运算符返回带有两个过滤器的逆向数据

相关推荐

最近更新

标签