pandas 如何获得数据框的简单散点图(最好使用 seaborn)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29279293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I get a simple scatter plot of a dataframe (preferrably with seaborn)
提问by theQman
I'm trying to scatter plot the following dataframe:
我正在尝试散点图以下数据框:
mydf = pd.DataFrame({'x':[1,2,3,4,5,6,7,8,9],
'y':[9,8,7,6,5,4,3,2,1],
'z':np.random.randint(0,9, 9)},
index=["12:00", "1:00", "2:00", "3:00", "4:00",
"5:00", "6:00", "7:00", "8:00"])
x y z
12:00 1 9 1
1:00 2 8 1
2:00 3 7 7
3:00 4 6 7
4:00 5 5 4
5:00 6 4 2
6:00 7 3 2
7:00 8 2 8
8:00 9 1 8
I would like to see the times "12:00, 1:00, ..." as the x-axis and x,y,zcolumns on the y-axis.
我希望将时间“12:00, 1:00, ...”作为 x 轴和x,y,zy 轴上的列。
When I try to plot with pandas via mydf.plot(kind="scatter"), I get the error ValueError: scatter requires and x and y column. Do I have to break down my dataframe into appropriate parameters? What I would really like to do is get this scatter plotted with seaborn.
当我尝试通过 Pandas 绘图时mydf.plot(kind="scatter"),出现错误ValueError: scatter requires and x and y column。我是否必须将数据框分解为适当的参数?我真正想做的是用seaborn绘制这个散点图。
回答by Carsten
Just running
刚跑
mydf.plot(style=".")
works fine for me:
对我来说很好用:


回答by T.C. Proctor
Seaborn is actually built around pandas.DataFrames. However, your data frame needs to be "tidy":
Seaborn 实际上是围绕pandas.DataFrames构建的。但是,您的数据框需要“整洁”:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
- 每个变量形成一列。
- 每个观察值形成一行。
- 每种类型的观测单元形成一个表格。
Since you want to plot x, y, and z on the same plot, it seems like they are actually different observations. Thus, you really have three variables: time, value, and the letter used.
由于您想在同一张图上绘制 x、y 和 z,看起来它们实际上是不同的观测值。因此,您确实拥有三个变量:时间、值和使用的字母。
The "tidy" standard comes from Hadly Wickham, who implemented it in the tidyr package.
在“整洁”的标准来自Hadly韦翰,谁在tidyr包中实现它。
First, I convert the index to a Datetime:
首先,我将索引转换为日期时间:
mydf.index = pd.DatetimeIndex(mydf.index)
Then we do the conversion to tidy data:
然后我们转换成整洁的数据:
pivoted = mydf.unstack().reset_index()
and rename the columns
并重命名列
pivoted = pivoted.rename(columns={"level_0": "letter", "level_1": "time", 0: "value"})
Now, this is what our data looks like:
现在,这就是我们的数据的样子:
letter time value
0 x 2019-03-13 12:00:00 1
1 x 2019-03-13 01:00:00 2
2 x 2019-03-13 02:00:00 3
3 x 2019-03-13 03:00:00 4
4 x 2019-03-13 04:00:00 5
Unfortunately, seaborn doesn't play with DateTimes that well, so you can just extract the hour as an integer:
不幸的是,seaborn 不能很好地使用 DateTimes,因此您可以将小时提取为整数:
pivoted["hour"] = pivoted["time"].dt.hour
With a data frame in this form, seaborn takes in the data easily:
使用这种形式的数据框,seaborn 可以轻松获取数据:
import seaborn as sns
sns.set()
sns.scatterplot(data=pivoted, x="hour", y="value", hue="letter")
Outputs:
输出:

