Python Pandas/Pyplot 中的散点图:如何按类别绘制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21654635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:20:42  来源:igfitidea点击:

Scatter plots in Pandas/Pyplot: How to plot by category

pythonmatplotlibpandas

提问by user2989613

I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.

我正在尝试使用 Pandas DataFrame 对象在 pyplot 中制作一个简单的散点图,但想要一种绘制两个变量的有效方法,但符号由第三列(键)指示。我尝试了各种使用 df.groupby 的方法,但都没有成功。下面是一个示例 df 脚本。这会根据“key1”为标记着色,但我希望看到带有“key1”类别的图例。我很亲近吗?谢谢。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
plt.show()

采纳答案by Joe Kington

You can use scatterfor this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

您可以scatter为此使用它,但这需要您的 具有数值key1,并且正如您所注意到的那样,您不会有图例。

It's better to just use plotfor discrete categories like this. For example:

最好只plot用于这样的离散类别。例如:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

enter image description here

在此处输入图片说明

If you'd like things to look like the default pandasstyle, then just update the rcParamswith the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

如果您希望事情看起来像默认pandas样式,那么只需rcParams使用 pandas 样式表更新并使用其颜色生成器。(我也在稍微调整图例):

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')

fig, ax = plt.subplots()
ax.set_color_cycle(colors)
ax.margins(0.05)
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend(numpoints=1, loc='upper left')

plt.show()

enter image description here

在此处输入图片说明

回答by CT Zhu

With plt.scatter, I can only think of one: to use a proxy artist:

有了plt.scatter,我只能想到一个:使用代理艺术家:

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
x=ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)

ccm=x.get_cmap()
circles=[Line2D(range(1), range(1), color='w', marker='o', markersize=10, markerfacecolor=item) for item in ccm((array([4,6,8])-4.0)/4)]
leg = plt.legend(circles, ['4','6','8'], loc = "center left", bbox_to_anchor = (1, 0.5), numpoints = 1)

And the result is:

结果是:

enter image description here

在此处输入图片说明

回答by Bob Baxley

This is simple to do with Seaborn(pip install seaborn) as a oneliner

这很简单,用Seaborn( pip install seaborn) 作为单线

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5):

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5)

import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1974)

df = pd.DataFrame(
    np.random.normal(10, 1, 30).reshape(10, 3),
    index=pd.date_range('2010-01-01', freq='M', periods=10),
    columns=('one', 'two', 'three'))
df['key1'] = (4, 4, 4, 6, 6, 6, 8, 8, 8, 8)

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5)

enter image description here

在此处输入图片说明

Here is the dataframe for reference:

这是供参考的数据框:

enter image description here

在此处输入图片说明

Since you have three variable columns in your data, you may want to plot all pairwise dimensions with:

由于您的数据中有三个变量列,您可能希望绘制所有成对维度:

sns.pairplot(vars=["one","two","three"], data=df, hue="key1", size=5)

enter image description here

在此处输入图片说明

https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/is another option.

https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/是另一种选择。

回答by Nipun Batra

You can also try Altairor ggpotwhich are focused on declarative visualisations.

您还可以尝试专注于声明式可视化的Altairggpot

import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

Altair code

Altair 代码

from altair import Chart
c = Chart(df)
c.mark_circle().encode(x='x', y='y', color='label')

enter image description here

在此处输入图片说明

ggplot code

ggplot代码

from ggplot import *
ggplot(aes(x='x', y='y', color='label'), data=df) +\
geom_point(size=50) +\
theme_bw()

enter image description here

在此处输入图片说明

回答by Arjaan Buijk

You can use df.plot.scatter, and pass an array to c= argument defining the color of each point:

您可以使用 df.plot.scatter,并将数组传递给 c= 参数定义每个点的颜色:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
colors = np.where(df["key1"]==4,'r','-')
colors[df["key1"]==6] = 'g'
colors[df["key1"]==8] = 'b'
print(colors)
df.plot.scatter(x="one",y="two",c=colors)
plt.show()

enter image description here

在此处输入图片说明

回答by fuglede

It's rather hacky, but you could use one1as a Float64Indexto do everything in one go:

它相当笨拙,但您可以将其one1用作Float64Index一次性完成所有事情:

df.set_index('one').sort_index().groupby('key1')['two'].plot(style='--o', legend=True)

enter image description here

在此处输入图片说明

Note that as of 0.20.3, sorting the index is necessary, and the legend is a bit wonky.

请注意,从 0.20.3 开始,排序索引是必要的,并且图例有点不稳定

回答by ImportanceOfBeingErnest

From matplotlib 3.1 onwards you can use .legend_elements(). An example is shown in Automated legend creation. The advantage is that a single scatter call can be used.

从 matplotlib 3.1 开始,您可以使用.legend_elements(). 自动图例创建中显示了一个示例。优点是可以使用单个分散调用。

In this case:

在这种情况下:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)


fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
ax.legend(*sc.legend_elements())
plt.show()

enter image description here

在此处输入图片说明

In case the keys were not directly given as numbers, it would look as

如果键不是直接作为数字给出的,它看起来像

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = list("AAABBBCCCC")

labels, index = np.unique(df["key1"], return_inverse=True)

fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = index, alpha = 0.8)
ax.legend(sc.legend_elements()[0], labels)
plt.show()

enter image description here

在此处输入图片说明