pandas 来自不同长度列的 Python 箱线图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23144071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python boxplot out of columns of different lengths
提问by user308827
I have the following dataframe in Python (the actual dataframe is much bigger, just presenting a small sample):
我在 Python 中有以下数据框(实际数据框要大得多,只是展示了一个小样本):
A B C D E F
0 0.43 0.52 0.96 1.17 1.17 2.85
1 0.43 0.52 1.17 2.72 2.75 2.94
2 0.43 0.53 1.48 2.85 2.83
3 0.47 0.59 1.58 3.14
4 0.49 0.80
I convert the dataframe to numpy using df.values and then pass that to boxplot.
我使用 df.values 将数据框转换为 numpy,然后将其传递给 boxplot。
When I try to make a boxplot out of this pandas dataframe, the number of values picked from each column is restricted to the least number of values in a column (in this case, column F). Is there any way I can boxplot all values from each column?
当我尝试从这个 Pandas 数据框中制作箱线图时,从每列中选取的值的数量被限制为列中的最少数量的值(在本例中为 F 列)。有什么方法可以对每列中的所有值进行箱线图绘制?
NOTE: I use df.dropna to drop the rows in each column with missing values. However, this is resizing the dataframe to the lowest common denominator of column length, and messing up the plotting.
注意:我使用 df.dropna 删除每列中缺少值的行。但是,这会将数据框的大小调整为列长度的最小公分母,并弄乱了绘图。
import prettyplotlib as ppl
import numpy as np
import pandas
import matplotlib as mpl
from matplotlib import pyplot
df = pandas.DataFrame.from_csv(csv_data,index_col=False)
df = df.dropna()
labels = ['A', 'B', 'C', 'D', 'E', 'F']
fig, ax = pyplot.subplots()
ppl.boxplot(ax, df.values, xticklabels=labels)
pyplot.show()
采纳答案by CT Zhu
The right way to do it, saving from reinventing the wheel, would be to use the .boxplot()in pandas, where the nanhandled correctly:
正确的方法是使用.boxplot()in pandas,在nan正确处理的地方,避免重新发明轮子:
In [31]:
print df
A B C D E F
0 0.43 0.52 0.96 1.17 1.17 2.85
1 0.43 0.52 1.17 2.72 2.75 2.94
2 0.43 0.53 1.48 2.85 2.83 NaN
3 0.47 0.59 1.58 NaN 3.14 NaN
4 0.49 0.80 NaN NaN NaN NaN
[5 rows x 6 columns]
In [32]:
_=plt.boxplot(df.values)
_=plt.xticks(range(1,7),labels)
plt.savefig('1.png') #keeping the nan's and plot by plt


In [33]:
_=df.boxplot()
plt.savefig('2.png') #keeping the nan's and plot by pandas


In [34]:
_=plt.boxplot(df.dropna().values)
_=plt.xticks(range(1,7),labels)
plt.savefig('3.png') #dropping the nan's and plot by plt



