pandas 来自不同长度列的 Python 箱线图

Question

提问by user308827

I have the following dataframe in Python (the actual dataframe is much bigger, just presenting a small sample):

我在 Python 中有以下数据框（实际数据框要大得多，只是展示了一个小样本）：

      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83  
3  0.47  0.59  1.58        3.14  
4  0.49  0.80

I convert the dataframe to numpy using df.values and then pass that to boxplot.

我使用 df.values 将数据框转换为 numpy，然后将其传递给 boxplot。

When I try to make a boxplot out of this pandas dataframe, the number of values picked from each column is restricted to the least number of values in a column (in this case, column F). Is there any way I can boxplot all values from each column?

当我尝试从这个 Pandas 数据框中制作箱线图时，从每列中选取的值的数量被限制为列中的最少数量的值（在本例中为 F 列）。有什么方法可以对每列中的所有值进行箱线图绘制？

NOTE: I use df.dropna to drop the rows in each column with missing values. However, this is resizing the dataframe to the lowest common denominator of column length, and messing up the plotting.

注意：我使用 df.dropna 删除每列中缺少值的行。但是，这会将数据框的大小调整为列长度的最小公分母，并弄乱了绘图。

import prettyplotlib as ppl
import numpy as np
import pandas
import matplotlib as mpl
from matplotlib import pyplot

df = pandas.DataFrame.from_csv(csv_data,index_col=False)
df = df.dropna()
labels = ['A', 'B', 'C', 'D', 'E', 'F']
fig, ax = pyplot.subplots()
ppl.boxplot(ax, df.values, xticklabels=labels)
pyplot.show()

Answer 1

采纳答案by CT Zhu

The right way to do it, saving from reinventing the wheel, would be to use the .boxplot()in pandas, where the nanhandled correctly:

正确的方法是使用.boxplot()in pandas，在nan正确处理的地方，避免重新发明轮子：

In [31]:

print df
      A     B     C     D     E     F
0  0.43  0.52  0.96  1.17  1.17  2.85
1  0.43  0.52  1.17  2.72  2.75  2.94
2  0.43  0.53  1.48  2.85  2.83   NaN
3  0.47  0.59  1.58   NaN  3.14   NaN
4  0.49  0.80   NaN   NaN   NaN   NaN

[5 rows x 6 columns]
In [32]:

_=plt.boxplot(df.values)
_=plt.xticks(range(1,7),labels)
plt.savefig('1.png') #keeping the nan's and plot by plt

enter image description here

In [33]:

_=df.boxplot()
plt.savefig('2.png') #keeping the nan's and plot by pandas

enter image description here

In [34]:

_=plt.boxplot(df.dropna().values)
_=plt.xticks(range(1,7),labels)
plt.savefig('3.png') #dropping the nan's and plot by plt

enter image description here

pandas 来自不同长度列的 Python 箱线图

提问by user308827

采纳答案by CT Zhu

相关推荐

最近更新

标签

pandas 来自不同长度列的 Python 箱线图

提问by user308827

采纳答案by CT Zhu

相关推荐

pandas 使用pandas读取JSON文件进行Python分析

如何在前瞻性的基础上使用 Pandas 滚动_* 函数

pandas 使用行熊猫 python 上的部分字符串匹配返回 DataFrame 项目

pandas 如何有效地删除python中数据帧或csv文件中的所有重复项？

相关推荐

最近更新

标签