pandas 使用 matplotlib 为不同的分类级别绘制不同的颜色
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26139423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
plot different color for different categorical levels using matplotlib
提问by avocado
I have this data frame diamonds
which is composed of variables like (carat, price, color)
, and I want to draw a scatter plot of price
to carat
for each color
, which means different color
has different color in the plot.
我有此数据帧diamonds
,它由被等变量(carat, price, color)
,我想画的散点图price
来carat
为每个color
,这意味着不同的color
具有在图中不同的颜色。
This is easy in R
with ggplot
:
这很容易在R
与ggplot
:
ggplot(aes(x=carat, y=price, color=color), #by setting color=color, ggplot automatically draw in different colors
data=diamonds) + geom_point(stat='summary', fun.y=median)
I wonder how could this be done in Python using matplotlib
?
我想知道如何在 Python 中使用matplotlib
?
PS:
PS:
I know about auxiliary plotting packages, such as seaborn
and ggplot for python
, and I donot prefer them, just want to find out if it is possible to do the job using matplotlib
alone, ;P
我知道辅助绘图包,例如seaborn
and ggplot for python
,但我不喜欢它们,只是想知道是否可以matplotlib
单独使用,;P
回答by Ffisegydd
You can pass plt.scatter
a c
argument which will allow you to select the colors. The code below defines a colors
dictionary to map your diamond colors to the plotting colors.
您可以传递plt.scatter
一个c
参数,该参数将允许您选择颜色。下面的代码定义了一个colors
字典来将您的钻石颜色映射到绘图颜色。
import matplotlib.pyplot as plt
import pandas as pd
carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]
df = pd.DataFrame(dict(carat=carat, price=price, color=color))
fig, ax = plt.subplots()
colors = {'D':'red', 'E':'blue', 'F':'green', 'G':'black'}
ax.scatter(df['carat'], df['price'], c=df['color'].apply(lambda x: colors[x]))
plt.show()
df['color'].apply(lambda x: colors[x])
effectively maps the colours from "diamond" to "plotting".
df['color'].apply(lambda x: colors[x])
有效地将颜色从“菱形”映射到“绘图”。
(Forgive me for not putting another example image up, I think 2 is enough :P)
(原谅我没有放另一个示例图像,我认为 2 就足够了:P)
With seaborn
和 seaborn
You can use seaborn
which is a wrapper around matplotlib
that makes it look prettier by default (rather opinion-based, I know :P) but also adds some plotting functions.
您可以使用seaborn
which 是一个包装器matplotlib
,默认情况下使其看起来更漂亮(我知道:P),但还添加了一些绘图功能。
For this you could use seaborn.lmplot
with fit_reg=False
(which prevents it from automatically doing some regression).
为此,您可以使用seaborn.lmplot
with fit_reg=False
(防止它自动进行一些回归)。
The code below uses an example dataset. By selecting hue='color'
you tell seaborn to split your dataframe up based on your colours and then plot each one.
下面的代码使用了一个示例数据集。通过选择,hue='color'
您可以告诉 seaborn 根据您的颜色拆分您的数据框,然后绘制每个数据框。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]
df = pd.DataFrame(dict(carat=carat, price=price, color=color))
sns.lmplot('carat', 'price', data=df, hue='color', fit_reg=False)
plt.show()
Without seaborn
using pandas.groupby
不seaborn
使用pandas.groupby
If you don't want to use seaborn then you can use pandas.groupby
to get the colors alone and then plot them using just matplotlib, but you'll have to manually assign colors as you go, I've added an example below:
如果您不想使用 seaborn,那么您可以使用pandas.groupby
单独获取颜色,然后仅使用 matplotlib 绘制它们,但是您必须随时手动分配颜色,我在下面添加了一个示例:
fig, ax = plt.subplots()
colors = {'D':'red', 'E':'blue', 'F':'green', 'G':'black'}
grouped = df.groupby('color')
for key, group in grouped:
group.plot(ax=ax, kind='scatter', x='carat', y='price', label=key, color=colors[key])
plt.show()
This code assumes the same DataFrame as above and then groups it based on color
. It then iterates over these groups, plotting for each one. To select a color I've created a colors
dictionary which can map the diamond color (for instance D
) to a real color (for instance red
).
此代码假定与上述相同的 DataFrame,然后基于color
. 然后它遍历这些组,为每个组绘图。为了选择一种颜色,我创建了一个colors
字典,它可以将钻石颜色(例如D
)映射到真实颜色(例如red
)。
回答by Rems
Here's a succinct and generic solution to use a seaborn color palette.
这是使用 seaborn 调色板的简洁通用解决方案。
First find a color paletteyou like and optionally visualize it:
sns.palplot(sns.color_palette("Set2", 8))
Then you can use it with matplotlib
doing this:
然后你可以用它matplotlib
来做这个:
# Unique category labels: 'D', 'F', 'G', ...
color_labels = df['color'].unique()
# List of RGB triplets
rgb_values = sns.color_palette("Set2", 8)
# Map label to RGB
color_map = dict(zip(color_labels, rgb_values))
# Finally use the mapped values
plt.scatter(df['carat'], df['price'], c=df['color'].map(color_map))
回答by Nipun Batra
回答by deprekate
I had the same question, and have spent all day trying out different packages.
我有同样的问题,并且一整天都在尝试不同的软件包。
I had originally used matlibplot: and was not happy with either mapping categories to predefined colors; or grouping/aggregating then iterating through the groups (and still having to map colors). I just felt it was poor package implementation.
我最初使用 matlibplot: 并且对将类别映射到预定义颜色不满意;或分组/聚合然后遍历组(并且仍然必须映射颜色)。我只是觉得这是一个糟糕的包实现。
Seaborn wouldn't work on my case, and Altair ONLY works inside of a Jupyter Notebook.
Seaborn 不适用于我的案例,而 Altair 仅适用于 Jupyter Notebook。
The best solution for me was PlotNine, which "is an implementation of a grammar of graphics in Python, and based on ggplot2".
对我来说最好的解决方案是 PlotNine,它“是 Python 中图形语法的实现,基于 ggplot2”。
Below is the plotnine code to replicate your R example in Python:
下面是在 Python 中复制 R 示例的 plotnine 代码:
from plotnine import *
from plotnine.data import diamonds
g = ggplot(diamonds, aes(x='carat', y='price', color='color')) + geom_point(stat='summary')
print(g)
So clean and simple :)
如此干净和简单:)
回答by Pablo Reyes
Here a combination of markers and colors from a qualitative colormap in matplotlib
:
这里是来自定性颜色图中的标记和颜色的组合matplotlib
:
import itertools
import numpy as np
from matplotlib import markers
import matplotlib.pyplot as plt
m_styles = markers.MarkerStyle.markers
N = 60
colormap = plt.cm.Dark2.colors # Qualitative colormap
for i, (marker, color) in zip(range(N), itertools.product(m_styles, colormap)):
plt.scatter(*np.random.random(2), color=color, marker=marker, label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., ncol=4);
回答by Simon
With df.plot()
使用 df.plot()
Normally when quickly plotting a DataFrame, I use pd.DataFrame.plot()
. This takes the index as the x value, the value as the y value and plots each column separately with a different color.
A DataFrame in this form can be achieved by using set_index
and unstack
.
通常在快速绘制 DataFrame 时,我使用pd.DataFrame.plot()
. 这将索引作为 x 值,将值作为 y 值,并用不同的颜色分别绘制每一列。这种形式的 DataFrame 可以通过使用set_index
和来实现unstack
。
import matplotlib.pyplot as plt
import pandas as pd
carat = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
price = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
color =['D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G',]
df = pd.DataFrame(dict(carat=carat, price=price, color=color))
df.set_index(['color', 'carat']).unstack('color')['price'].plot(style='o')
plt.ylabel('price')
With this method you do not have to manually specify the colors.
使用此方法,您不必手动指定颜色。
This procedure may make more sense for other data series. In my case I have timeseries data, so the MultiIndex consists of datetime and categories. It is also possible to use this approach for more than one column to color by, but the legend is getting a mess.
此过程可能对其他数据系列更有意义。就我而言,我有时间序列数据,因此 MultiIndex 由日期时间和类别组成。也可以使用这种方法为多个列着色,但图例变得一团糟。
回答by VICTOR RODE?O SANCHEZ
I usually do it using Seaborn which is built on top of matplotlib
我通常使用建立在 matplotlib 之上的 Seaborn
import seaborn as sns
iris = sns.load_dataset('iris')
sns.scatterplot(x='sepal_length', y='sepal_width',
hue='species', data=iris);