从 Pandas DataFrame 绘图时注释数据点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15910019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Annotate data points while plotting from Pandas DataFrame
提问by LondonRob
I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.
我想在图上的点旁边用它们的值注释数据点。我发现的例子只将 x 和 y 作为向量处理。但是,我想为包含多列的 Pandas DataFrame 执行此操作。
ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()
What is the best way to annotate all the points for a multi-column DataFrame?
注释多列 DataFrame 的所有点的最佳方法是什么?
采纳答案by Dan Allan
Do you want to use one of the other columns as the text of the annotation? This is something I did recently.
您想使用其他列之一作为注释的文本吗?这是我最近做的事情。
Starting with some example data
从一些示例数据开始
In [1]: df
Out[1]:
x y val
0 -1.015235 0.840049 a
1 -0.427016 0.880745 b
2 0.744470 -0.401485 c
3 1.334952 -0.708141 d
4 0.127634 -1.335107 e
Plot the points. I plot y against x, in this example.
绘制点。在这个例子中,我绘制了 y 对 x 的图。
ax = df.set_index('x')['y'].plot(style='o')
Write a function that loops over x, y, and the value to annotate beside the point.
编写一个函数,循环遍历 x、y 和要在点旁边注释的值。
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x'], point['y'], str(point['val']))
label_point(df.x, df.y, df.val, ax)
draw()
回答by LondonRob
Here's a (very) slightly slicker version of Dan Allan's answer:
这是丹艾伦答案的(非常)稍微光滑的版本:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
Which gives:
这使:
x y
a 0.541974 0.042185
b 0.036188 0.775425
c 0.950099 0.888305
d 0.739367 0.638368
e 0.739910 0.596037
f 0.974529 0.111819
g 0.640637 0.161805
h 0.554600 0.172221
i 0.718941 0.192932
j 0.447242 0.172469
And then:
进而:
fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)
for k, v in df.iterrows():
ax.annotate(k, v)
Finally, if you're in interactive mode you might need to refresh the plot:
最后,如果您处于交互模式,您可能需要刷新绘图:
fig.canvas.draw()
Which produces:
其中产生:
Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:
或者,因为这看起来非常丑陋,你可以很容易地美化事物:
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0,
c=range(len(df)), colormap=cmap)
for k, v in df.iterrows():
ax.annotate(k, v,
xytext=(10,-5), textcoords='offset points',
family='sans-serif', fontsize=18, color='darkslategrey')
Which looks a lot nicer:
看起来好多了:
回答by tozCSS
Let's assume your df
has multiple columns, and three of which are x
, y
, and lbl
. To annotate your (x,y)
scatter plot with lbl
, simply:
让我们假设你df
有多个列,其中三个是x
,y
和lbl
。要使用 注释(x,y)
散点图lbl
,只需:
ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);
回答by Alnilam
I found the previous answers quite helpful, especially LondonRob's examplethat improved the layout a bit.
我发现以前的答案很有帮助,尤其是LondonRob 的示例,它稍微改进了布局。
The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.
唯一困扰我的是我不喜欢从 DataFrame 中提取数据然后循环遍历它们。似乎浪费了DataFrame。
Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):
这是一种使用 .apply() 避免循环的替代方法,并包括更好看的注释(我认为色阶有点矫枉过正,无法让颜色条消失):
ax = df.plot('x', 'y', kind='scatter', s=50 )
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
_ = df.apply(annotate_df, axis=1)
Edit Notes
编辑笔记
I edited my code example recently. Originally it used the same:
我最近编辑了我的代码示例。最初它使用相同的:
fig, ax = plt.subplots()
as the other posts to expose the axes, however this is unnecessary and makes the:
作为暴露轴的其他帖子,但是这是不必要的,并且使:
import matplotlib.pyplot as plt
line also unnecessary.
线也不必要。
Also note:
另请注意:
- If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
- Depending on the points, you may have to play with the
xytext
values to get better placements.
- 如果您尝试重现此示例,并且您的绘图中的点与我们的任何一个不在同一位置,则可能是因为 DataFrame 使用了随机值。如果我们使用固定数据表或随机种子,可能不会那么混乱。
- 根据点数,您可能需要调整
xytext
值以获得更好的位置。