pandas 使用 seaborn.pairplot() 以多种颜色绘制数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/54317168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting a dataframe with seaborn.pairplot() in multiple colors?
提问by Philipp
I want to create a plot similar to this image in order to compare multiple dims of my dataset. The dataset is no preset. I managed to display the data correctly in one color, but I want one colour for y=0 and one for y=1 to compare the points. Just like in the image of the iris dataset. As soon as I include the hue='y'
in the sns.pairplot
method the code will not compile until the end.
我想创建一个类似于此图像的图,以便比较我的数据集的多个维度。数据集没有预设。我设法以一种颜色正确显示数据,但我想要 y=0 的一种颜色和 y=1 的一种颜色来比较点。就像在 iris 数据集的图像中一样。一旦我hue='y'
在sns.pairplot
方法中包含 ,代码直到最后才会编译。
Also I dont understand the console output. What's the issue?
我也不明白控制台输出。有什么问题?
import seaborn as sns; sns.set(style="ticks", color_codes=True)
import pandas as pd
将 seaborn 作为 sns 导入;sns.set(style="ticks", color_codes=True) 将Pandas导入为 pd
dataframe = pd.DataFrame(dict(F1=X[:, 0], F2=X[:, 1], F3=X[:, 2], F4=X[:, 3], y=y))
print(dataframe)
g = sns.pairplot(dataframe, hue='y')
This is the output for the dataframe
. It looks alright to me:
这是dataframe
. 我觉得没问题:
F1 F2 F3 F4 y
0 3.173182 2.849991 2.497907 2.851715 0.0
1 2.468625 -0.216985 0.275206 1.232518 1.0
2 2.398419 2.258931 2.255533 4.895872 0.0
3 1.379937 1.041677 1.165911 1.992650 1.0
4 2.489665 2.269068 4.129961 2.218203 0.0
5 4.140160 2.809088 2.973027 3.553128 0.0
6 2.997969 1.701299 2.978875 1.946793 0.0
7 3.864436 3.554276 3.568455 2.839489 0.0
8 -0.000605 1.376971 1.128350 1.293777 1.0
9 2.398057 1.180861 2.400801 2.264726 1.0
10 0.997385 -0.560205 0.954628 2.788858 1.0
... ... ... ... ... ...
3990 3.334553 4.576306 2.470476 3.032781 0.0
3991 1.465784 2.304793 1.267303 -0.030802 1.0
3992 0.505905 -0.280769 -1.223464 1.077305 1.0
3993 2.581596 3.924394 3.878303 2.579366 0.0
3994 4.362067 2.247818 2.948595 1.906314 0.0
3995 2.310546 0.006672 2.382227 1.940343 1.0
3996 -0.944635 1.387136 0.604135 2.421478 1.0
3997 1.290999 1.485965 0.262792 0.899340 1.0
3998 0.864532 1.759607 1.118346 1.038935 1.0
3999 1.819110 2.218838 3.927945 2.593009 0.0
[4000 rows x 5 columns]
But eventually I receive this error:
但最终我收到了这个错误:
Traceback (most recent call last):
File "/Users//PycharmProjects//V3_multiTops/vergleich.py", line 131, in <module>
g = sns.pairplot(dataframe, hue='y')
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2111, in pairplot
grid.map_diag(kdeplot, **diag_kws)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1399, in map_diag
func(data_k, label=label_k, color=color, **kwargs)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 691, in kdeplot
cumulative=cumulative, **kwargs)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 294, in _univariate_kdeplot
x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 366, in _scipy_univariate_kde
kde = stats.gaussian_kde(data, bw_method=bw)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 172, in __init__
self.set_bandwidth(bw_method=bw_method)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 499, in set_bandwidth
self._compute_covariance()
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 510, in _compute_covariance
self._data_inv_cov = linalg.inv(self._data_covariance)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/linalg/basic.py", line 975, in inv
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
I think I am doing something wrong with the sns.pairplot()
, which I don't understand yet. Can you explain it to me please?
我想我做错了什么sns.pairplot()
,我还不明白。你能给我解释一下吗?
回答by ImportanceOfBeingErnest
The problem seems to be that the "y"
column itself is numeric. It would hence be included in the pairgrid as a column/row. This seems undesired anyways. To select the variables that shall take part in the grid, use the pairplot
's vars
keyword.
问题似乎是"y"
列本身是数字。因此,它将作为列/行包含在pairgrid 中。无论如何,这似乎是不受欢迎的。要选择应参与网格的变量,请使用pairplot
'svars
关键字。
sns.pairplot(df, vars=df.columns[:-1], hue="y")
The reason the iris
dataset works without specifying vars
is that the hue
column is not numeric. Non-numeric columns are not included in the grid.
iris
数据集在没有指定的情况下工作的原因vars
是该hue
列不是数字。非数字列不包含在网格中。
Complete example:
完整示例:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(300, 4), columns=[f"F{i+1}" for i in range(4)])
df["y"] = np.random.choice([1., 0.], size=len(df))
sns.pairplot(df, vars=df.columns[:-1], hue="y")
plt.show()