从 Pandas DataFrame 绘图时注释数据点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15910019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:08:42  来源:igfitidea点击:

Annotate data points while plotting from Pandas DataFrame

matplotlibpandas

提问by LondonRob

I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.

我想在图上的点旁边用它们的值注释数据点。我发现的例子只将 x 和 y 作为向量处理。但是,我想为包含多列的 Pandas DataFrame 执行此操作。

ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()

What is the best way to annotate all the points for a multi-column DataFrame?

注释多列 DataFrame 的所有点的最佳方法是什么?

采纳答案by Dan Allan

Do you want to use one of the other columns as the text of the annotation? This is something I did recently.

您想使用其他列之一作为注释的文本吗?这是我最近做的事情。

Starting with some example data

从一些示例数据开始

In [1]: df
Out[1]: 
           x         y val
 0 -1.015235  0.840049   a
 1 -0.427016  0.880745   b
 2  0.744470 -0.401485   c
 3  1.334952 -0.708141   d
 4  0.127634 -1.335107   e

Plot the points. I plot y against x, in this example.

绘制点。在这个例子中,我绘制了 y 对 x 的图。

ax = df.set_index('x')['y'].plot(style='o')

Write a function that loops over x, y, and the value to annotate beside the point.

编写一个函数,循环遍历 x、y 和要在点旁边注释的值。

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.val, ax)

draw()

Annotated Points

注释点

回答by LondonRob

Here's a (very) slightly slicker version of Dan Allan's answer:

这是丹艾伦答案的(非常)稍微光滑的版本:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string

df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)}, 
                  index=list(string.ascii_lowercase[:10]))

Which gives:

这使:

          x         y
a  0.541974  0.042185
b  0.036188  0.775425
c  0.950099  0.888305
d  0.739367  0.638368
e  0.739910  0.596037
f  0.974529  0.111819
g  0.640637  0.161805
h  0.554600  0.172221
i  0.718941  0.192932
j  0.447242  0.172469

And then:

进而:

fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

Finally, if you're in interactive mode you might need to refresh the plot:

最后,如果您处于交互模式,您可能需要刷新绘图:

fig.canvas.draw()

Which produces: Boring scatter plot

其中产生: 无聊的散点图

Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:

或者,因为这看起来非常丑陋,你可以很容易地美化事物:

from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0, 
        c=range(len(df)), colormap=cmap)

for k, v in df.iterrows():
    ax.annotate(k, v,
                xytext=(10,-5), textcoords='offset points',
                family='sans-serif', fontsize=18, color='darkslategrey')

Which looks a lot nicer: Nice scatter plot

看起来好多了: 不错的散点图

回答by tozCSS

Let's assume your dfhas multiple columns, and three of which are x, y, and lbl. To annotate your (x,y)scatter plot with lbl, simply:

让我们假设你df有多个列,其中三个是xylbl。要使用 注释(x,y)散点图lbl,只需:

ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);

回答by Alnilam

I found the previous answers quite helpful, especially LondonRob's examplethat improved the layout a bit.

我发现以前的答案很有帮助,尤其是LondonRob 的示例,它稍微改进了布局。

The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.

唯一困扰我的是我不喜欢从 DataFrame 中提取数据然后循环遍历它们。似乎浪费了DataFrame。

Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):

这是一种使用 .apply() 避免循环的替代方法,并包括更好看的注释(我认为色阶有点矫枉过正,无法让颜色条消失):

ax = df.plot('x', 'y', kind='scatter', s=50 )

def annotate_df(row):  
    ax.annotate(row.name, row.values,
                xytext=(10,-5), 
                textcoords='offset points',
                size=18, 
                color='darkslategrey')

_ = df.apply(annotate_df, axis=1)

enter image description here

在此处输入图片说明

Edit Notes

编辑笔记

I edited my code example recently. Originally it used the same:

我最近编辑了我的代码示例。最初它使用相同的:

fig, ax = plt.subplots()

as the other posts to expose the axes, however this is unnecessary and makes the:

作为暴露轴的其他帖子,但是这是不必要的,并且使:

import matplotlib.pyplot as plt

line also unnecessary.

线也不必要。

Also note:

另请注意:

  • If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
  • Depending on the points, you may have to play with the xytextvalues to get better placements.
  • 如果您尝试重现此示例,并且您的绘图中的点与我们的任何一个不在同一位置,则可能是因为 DataFrame 使用了随机值。如果我们使用固定数据表或随机种子,可能不会那么混乱。
  • 根据点数,您可能需要调整xytext值以获得更好的位置。