Python Pandas/Pyplot 中的散点图：如何按类别绘制

Question

提问by user2989613

I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.

我正在尝试使用 Pandas DataFrame 对象在 pyplot 中制作一个简单的散点图，但想要一种绘制两个变量的有效方法，但符号由第三列（键）指示。我尝试了各种使用 df.groupby 的方法，但都没有成功。下面是一个示例 df 脚本。这会根据“key1”为标记着色，但我希望看到带有“key1”类别的图例。我很亲近吗？谢谢。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
plt.show()

Answer 1

采纳答案by Joe Kington

You can use scatterfor this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

您可以scatter为此使用它，但这需要您的具有数值key1，并且正如您所注意到的那样，您不会有图例。

It's better to just use plotfor discrete categories like this. For example:

最好只plot用于这样的离散类别。例如：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

enter image description here

在此处输入图片说明

If you'd like things to look like the default pandasstyle, then just update the rcParamswith the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

如果您希望事情看起来像默认pandas样式，那么只需rcParams使用 pandas 样式表更新并使用其颜色生成器。（我也在稍微调整图例）：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')

fig, ax = plt.subplots()
ax.set_color_cycle(colors)
ax.margins(0.05)
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend(numpoints=1, loc='upper left')

plt.show()

enter image description here

在此处输入图片说明

Answer 2

回答by CT Zhu

With plt.scatter, I can only think of one: to use a proxy artist:

有了plt.scatter，我只能想到一个：使用代理艺术家：

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
x=ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)

ccm=x.get_cmap()
circles=[Line2D(range(1), range(1), color='w', marker='o', markersize=10, markerfacecolor=item) for item in ccm((array([4,6,8])-4.0)/4)]
leg = plt.legend(circles, ['4','6','8'], loc = "center left", bbox_to_anchor = (1, 0.5), numpoints = 1)

And the result is:

结果是：

enter image description here

在此处输入图片说明

Answer 3

回答by Bob Baxley

This is simple to do with Seaborn(pip install seaborn) as a oneliner

这很简单，用Seaborn( pip install seaborn) 作为单线

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5):

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5)：

import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1974)

df = pd.DataFrame(
    np.random.normal(10, 1, 30).reshape(10, 3),
    index=pd.date_range('2010-01-01', freq='M', periods=10),
    columns=('one', 'two', 'three'))
df['key1'] = (4, 4, 4, 6, 6, 6, 8, 8, 8, 8)

sns.pairplot(x_vars=["one"], y_vars=["two"], data=df, hue="key1", size=5)

Here is the dataframe for reference:

这是供参考的数据框：

Since you have three variable columns in your data, you may want to plot all pairwise dimensions with:

由于您的数据中有三个变量列，您可能希望绘制所有成对维度：

sns.pairplot(vars=["one","two","three"], data=df, hue="key1", size=5)

https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/is another option.

https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/是另一种选择。

Answer 4

回答by Nipun Batra

You can also try Altairor ggpotwhich are focused on declarative visualisations.

您还可以尝试专注于声明式可视化的Altair或ggpot。

import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

Altair code

Altair 代码

from altair import Chart
c = Chart(df)
c.mark_circle().encode(x='x', y='y', color='label')

ggplot code

ggplot代码

from ggplot import *
ggplot(aes(x='x', y='y', color='label'), data=df) +\
geom_point(size=50) +\
theme_bw()

Answer 5

回答by Arjaan Buijk

You can use df.plot.scatter, and pass an array to c= argument defining the color of each point:

您可以使用 df.plot.scatter，并将数组传递给 c= 参数定义每个点的颜色：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
colors = np.where(df["key1"]==4,'r','-')
colors[df["key1"]==6] = 'g'
colors[df["key1"]==8] = 'b'
print(colors)
df.plot.scatter(x="one",y="two",c=colors)
plt.show()

Answer 6

回答by fuglede

It's rather hacky, but you could use one1as a Float64Indexto do everything in one go:

它相当笨拙，但您可以将其one1用作Float64Index一次性完成所有事情：

df.set_index('one').sort_index().groupby('key1')['two'].plot(style='--o', legend=True)

Note that as of 0.20.3, sorting the index is necessary, and the legend is a bit wonky.

请注意，从 0.20.3 开始，排序索引是必要的，并且图例有点不稳定。

Answer 7

回答by ImportanceOfBeingErnest

From matplotlib 3.1 onwards you can use .legend_elements(). An example is shown in Automated legend creation. The advantage is that a single scatter call can be used.

从 matplotlib 3.1 开始，您可以使用.legend_elements(). 自动图例创建中显示了一个示例。优点是可以使用单个分散调用。

In this case:

在这种情况下：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)


fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
ax.legend(*sc.legend_elements())
plt.show()

In case the keys were not directly given as numbers, it would look as

如果键不是直接作为数字给出的，它看起来像

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = list("AAABBBCCCC")

labels, index = np.unique(df["key1"], return_inverse=True)

fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = index, alpha = 0.8)
ax.legend(sc.legend_elements()[0], labels)
plt.show()

Python Pandas/Pyplot 中的散点图：如何按类别绘制

提问by user2989613

采纳答案by Joe Kington

回答by CT Zhu

回答by Bob Baxley

回答by Nipun Batra

Altair code

Altair 代码

ggplot code

ggplot代码

回答by Arjaan Buijk

回答by fuglede

回答by ImportanceOfBeingErnest

相关推荐

最近更新

标签

Python Pandas/Pyplot 中的散点图：如何按类别绘制

提问by user2989613

采纳答案by Joe Kington

回答by CT Zhu

回答by Bob Baxley

回答by Nipun Batra

Altair code

Altair 代码

ggplot code

ggplot代码

回答by Arjaan Buijk

回答by fuglede

回答by ImportanceOfBeingErnest

相关推荐

Python 无法将 jinja2 变量传递到 javascript 片段中

Python：将随机数放入列表

python-opencv AttributeError: 'module' 对象没有属性 'createBackgroundSubtractorGMG'

Python Django 抽象模型与常规继承

相关推荐

最近更新

标签