pandas 使用 seaborn.pairplot() 以多种颜色绘制数据框？

Question

提问by Philipp

I want to create a plot similar to this image in order to compare multiple dims of my dataset. The dataset is no preset. I managed to display the data correctly in one color, but I want one colour for y=0 and one for y=1 to compare the points. Just like in the image of the iris dataset. As soon as I include the hue='y'in the sns.pairplotmethod the code will not compile until the end.

我想创建一个类似于此图像的图，以便比较我的数据集的多个维度。数据集没有预设。我设法以一种颜色正确显示数据，但我想要 y=0 的一种颜色和 y=1 的一种颜色来比较点。就像在 iris 数据集的图像中一样。一旦我hue='y'在sns.pairplot方法中包含，代码直到最后才会编译。

Also I dont understand the console output. What's the issue?

我也不明白控制台输出。有什么问题？

import seaborn as sns; sns.set(style="ticks", color_codes=True) import pandas as pd

将 seaborn 作为 sns 导入；sns.set(style="ticks", color_codes=True) 将Pandas导入为 pd

dataframe = pd.DataFrame(dict(F1=X[:, 0], F2=X[:, 1], F3=X[:, 2], F4=X[:, 3], y=y))

print(dataframe)

g = sns.pairplot(dataframe, hue='y')

This is the output for the dataframe. It looks alright to me:

这是dataframe. 我觉得没问题：

            F1        F2        F3        F4    y
0     3.173182  2.849991  2.497907  2.851715  0.0
1     2.468625 -0.216985  0.275206  1.232518  1.0
2     2.398419  2.258931  2.255533  4.895872  0.0
3     1.379937  1.041677  1.165911  1.992650  1.0
4     2.489665  2.269068  4.129961  2.218203  0.0
5     4.140160  2.809088  2.973027  3.553128  0.0
6     2.997969  1.701299  2.978875  1.946793  0.0
7     3.864436  3.554276  3.568455  2.839489  0.0
8    -0.000605  1.376971  1.128350  1.293777  1.0
9     2.398057  1.180861  2.400801  2.264726  1.0
10    0.997385 -0.560205  0.954628  2.788858  1.0

...        ...       ...       ...       ...  ...

3990  3.334553  4.576306  2.470476  3.032781  0.0
3991  1.465784  2.304793  1.267303 -0.030802  1.0
3992  0.505905 -0.280769 -1.223464  1.077305  1.0
3993  2.581596  3.924394  3.878303  2.579366  0.0
3994  4.362067  2.247818  2.948595  1.906314  0.0
3995  2.310546  0.006672  2.382227  1.940343  1.0
3996 -0.944635  1.387136  0.604135  2.421478  1.0
3997  1.290999  1.485965  0.262792  0.899340  1.0
3998  0.864532  1.759607  1.118346  1.038935  1.0
3999  1.819110  2.218838  3.927945  2.593009  0.0

[4000 rows x 5 columns]

But eventually I receive this error:

但最终我收到了这个错误：

Traceback (most recent call last):
  File "/Users//PycharmProjects//V3_multiTops/vergleich.py", line 131, in <module>
    g = sns.pairplot(dataframe, hue='y')
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2111, in pairplot
    grid.map_diag(kdeplot, **diag_kws)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1399, in map_diag
    func(data_k, label=label_k, color=color, **kwargs)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 691, in kdeplot
    cumulative=cumulative, **kwargs)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 294, in _univariate_kdeplot
    x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 366, in _scipy_univariate_kde
    kde = stats.gaussian_kde(data, bw_method=bw)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 172, in __init__
    self.set_bandwidth(bw_method=bw_method)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 499, in set_bandwidth
    self._compute_covariance()
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 510, in _compute_covariance
    self._data_inv_cov = linalg.inv(self._data_covariance)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/linalg/basic.py", line 975, in inv
    raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix

I think I am doing something wrong with the sns.pairplot(), which I don't understand yet. Can you explain it to me please?

我想我做错了什么sns.pairplot()，我还不明白。你能给我解释一下吗？

Answer 1

回答by ImportanceOfBeingErnest

The problem seems to be that the "y"column itself is numeric. It would hence be included in the pairgrid as a column/row. This seems undesired anyways. To select the variables that shall take part in the grid, use the pairplot's varskeyword.

问题似乎是"y"列本身是数字。因此，它将作为列/行包含在pairgrid 中。无论如何，这似乎是不受欢迎的。要选择应参与网格的变量，请使用pairplot'svars关键字。

 sns.pairplot(df, vars=df.columns[:-1], hue="y")

The reason the irisdataset works without specifying varsis that the huecolumn is not numeric. Non-numeric columns are not included in the grid.

iris数据集在没有指定的情况下工作的原因vars是该hue列不是数字。非数字列不包含在网格中。

Complete example:

完整示例：

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(300, 4), columns=[f"F{i+1}" for i in range(4)])
df["y"] = np.random.choice([1., 0.], size=len(df))

sns.pairplot(df, vars=df.columns[:-1], hue="y")
plt.show()

pandas 使用 seaborn.pairplot() 以多种颜色绘制数据框？

提问by Philipp

回答by ImportanceOfBeingErnest

相关推荐

最近更新

标签

pandas 使用 seaborn.pairplot() 以多种颜色绘制数据框？

提问by Philipp

回答by ImportanceOfBeingErnest

相关推荐

pandas Groupby 类并计算特征中的缺失值

pandas df.append() 没有附加到 DataFrame

pandas 从数组python创建一个数据框

pandas 在数据框的整个列中应用正则表达式

相关推荐

最近更新

标签