将 Pandas 数据帧处理为 violinplot

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43345599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:22:21  来源:igfitidea点击:

Process pandas dataframe into violinplot

pythonpandasmatplotlibseabornviolin-plot

提问by Emily Beth

I have data I'm reading from an Excel spreadsheet. The data has a number of observations for each of six scenarios, S1 to S6. When I read in the data to my dataframe df, it looks as follows:

我有从 Excel 电子表格中读取的数据。对于六个场景(S1 到 S6)中的每一个,数据都有许多观察结果。当我将数据读入我的数据框 df 时,它看起来如下:

      Scenario        LMP
0           S1 -21.454544
1           S1 -20.778094
2           S1 -20.027689
3           S1 -19.747170
4           S1 -20.814405
5           S1 -21.955406
6           S1 -23.018960
...
12258       S6 -34.089906
12259       S6 -34.222814
12260       S6 -26.712010
12261       S6 -24.555973
12262       S6 -23.062616
12263       S6 -20.488411

I want to create a violinplot that has a different violin for each of the six scenarios. I'm new to Pandas and dataframes, and despite much research/testing over the last day, I can't quite figure out an elegant way to pass some reference(s) to my dataframe (to split it into different series for each scenario) that will work in the axes.violinplot() statement. For instance, I've tried the following, which doesn't work. I get a "ValueError: cannot copy sequence with size 1752 to array axis with dimension 2" on my axes.violinplot statement.

我想为六个场景中的每一个创建一个具有不同小提琴的 violinplot。我是 Pandas 和数据框的新手,尽管在最后一天进行了大量研究/测试,但我无法找到一种优雅的方式将一些引用传递给我的数据框(将其拆分为每个场景的不同系列) ) 将在axes.violinplot() 语句中起作用。例如,我尝试了以下方法,但不起作用。我在我的axis.violinplot 语句中收到“ValueError:无法将大小为 1752 的序列复制到维度为 2 的数组轴”。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# load data into a dataframe
df = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)

fontsize = 10

fig, axes = plt.subplots()

axes.violinplot(dataset = [[df.loc[df.Scenario == 'S1']],
                           [df.loc[df.Scenario == 'S2']],
                           [df.loc[df.Scenario == 'S3']],
                           [df.loc[df.Scenario == 'S4']],
                           [df.loc[df.Scenario == 'S5']],
                           [df.loc[df.Scenario == 'S6']]
                          ]
                )
axes.set_title('Day Ahead Market')

axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

回答by Serenity

You may use seaborn. In this case, import seaborn and then use violin plotto visualize the scenarios.

你可以使用seaborn。在这种情况下,导入 seaborn,然后使用小提琴图来可视化场景。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# load data into a dataframe
df = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)
fontsize = 10

fig, axes = plt.subplots()
# plot violin. 'Scenario' is according to x axis, 
# 'LMP' is y axis, data is your dataframe. ax - is axes instance
sns.violinplot('Scenario','LMP', data=df, ax = axes)
axes.set_title('Day Ahead Market')

axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

enter image description here

在此处输入图片说明

回答by ImportanceOfBeingErnest

You need to be careful how to create the dataset to plot. In the code from the question you have a list of lists of one dataframe. However you need simply a list of one-column dataframes.

您需要小心如何创建要绘制的数据集。在问题的代码中,您有一个数据框列表的列表。但是,您只需要一个单列数据框列表。

You would therefore also need to take only the "LMP" column from the filtered dataframes, otherwise the violinplot wouldn't know which column to plot.

因此,您还需要仅从过滤后的数据框中获取“LMP”列,否则 violinplot 将不知道要绘制哪一列。

Here is a working example which stays close to the original code:

这是一个接近原始代码的工作示例:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = np.random.poisson(lam =3, size=100)
y = np.random.choice(["S{}".format(i+1) for i in range(6)], size=len(x))
df = pd.DataFrame({"Scenario":y, "LMP":x})

fig, axes = plt.subplots()

axes.violinplot(dataset = [df[df.Scenario == 'S1']["LMP"].values,
                           df[df.Scenario == 'S2']["LMP"].values,
                           df[df.Scenario == 'S3']["LMP"].values,
                           df[df.Scenario == 'S4']["LMP"].values,
                           df[df.Scenario == 'S5']["LMP"].values,
                           df[df.Scenario == 'S6']["LMP"].values ] )

axes.set_title('Day Ahead Market')
axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

enter image description here

在此处输入图片说明