pandas 来自不同长度列的 Python 箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23144071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:56:32  来源:igfitidea点击:

Python boxplot out of columns of different lengths

pythonpandasboxplotprettyplotlib

提问by user308827

I have the following dataframe in Python (the actual dataframe is much bigger, just presenting a small sample):

我在 Python 中有以下数据框(实际数据框要大得多,只是展示了一个小样本):

      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83  
3  0.47  0.59  1.58        3.14  
4  0.49  0.80        

I convert the dataframe to numpy using df.values and then pass that to boxplot.

我使用 df.values 将数据框转换为 numpy,然后将其传递给 boxplot。

When I try to make a boxplot out of this pandas dataframe, the number of values picked from each column is restricted to the least number of values in a column (in this case, column F). Is there any way I can boxplot all values from each column?

当我尝试从这个 Pandas 数据框中制作箱线图时,从每列中选取的值的数量被限制为列中的最少数量的值(在本例中为 F 列)。有什么方法可以对每列中的所有值进行箱线图绘制?

NOTE: I use df.dropna to drop the rows in each column with missing values. However, this is resizing the dataframe to the lowest common denominator of column length, and messing up the plotting.

注意:我使用 df.dropna 删除每列中缺少值的行。但是,这会将数据框的大小调整为列长度的最小公分母,并弄乱了绘图。

import prettyplotlib as ppl
import numpy as np
import pandas
import matplotlib as mpl
from matplotlib import pyplot

df = pandas.DataFrame.from_csv(csv_data,index_col=False)
df = df.dropna()
labels = ['A', 'B', 'C', 'D', 'E', 'F']
fig, ax = pyplot.subplots()
ppl.boxplot(ax, df.values, xticklabels=labels)
pyplot.show()

采纳答案by CT Zhu

The right way to do it, saving from reinventing the wheel, would be to use the .boxplot()in pandas, where the nanhandled correctly:

正确的方法是使用.boxplot()in pandas,在nan正确处理的地方,避免重新发明轮子:

In [31]:

print df
      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83   NaN
3  0.47  0.59  1.58   NaN  3.14   NaN
4  0.49  0.80   NaN   NaN   NaN   NaN

[5 rows x 6 columns]
In [32]:

_=plt.boxplot(df.values)
_=plt.xticks(range(1,7),labels)
plt.savefig('1.png') #keeping the nan's and plot by plt

enter image description here

enter image description here

In [33]:

_=df.boxplot()
plt.savefig('2.png') #keeping the nan's and plot by pandas

enter image description here

enter image description here

In [34]:

_=plt.boxplot(df.dropna().values)
_=plt.xticks(range(1,7),labels)
plt.savefig('3.png') #dropping the nan's and plot by plt

enter image description here

enter image description here