pandas 如何获得数据框的简单散点图（最好使用 seaborn）

Question

提问by theQman

I'm trying to scatter plot the following dataframe:

我正在尝试散点图以下数据框：

mydf = pd.DataFrame({'x':[1,2,3,4,5,6,7,8,9], 
                 'y':[9,8,7,6,5,4,3,2,1], 
                 'z':np.random.randint(0,9, 9)},
                index=["12:00", "1:00", "2:00", "3:00", "4:00", 
                       "5:00", "6:00", "7:00", "8:00"])



        x   y   z
 12:00  1   9   1
  1:00  2   8   1
  2:00  3   7   7
  3:00  4   6   7
  4:00  5   5   4
  5:00  6   4   2
  6:00  7   3   2
  7:00  8   2   8
  8:00  9   1   8

I would like to see the times "12:00, 1:00, ..." as the x-axis and x,y,zcolumns on the y-axis.

我希望将时间“12:00, 1:00, ...”作为 x 轴和x,y,zy 轴上的列。

When I try to plot with pandas via mydf.plot(kind="scatter"), I get the error ValueError: scatter requires and x and y column. Do I have to break down my dataframe into appropriate parameters? What I would really like to do is get this scatter plotted with seaborn.

当我尝试通过 Pandas 绘图时mydf.plot(kind="scatter")，出现错误ValueError: scatter requires and x and y column。我是否必须将数据框分解为适当的参数？我真正想做的是用seaborn绘制这个散点图。

Answer 1

回答by Carsten

Just running

刚跑

mydf.plot(style=".")

works fine for me:

对我来说很好用：

example scatterplot as result of the code above

作为上述代码结果的示例散点图

Answer 2

回答by T.C. Proctor

Seaborn is actually built around pandas.DataFrames. However, your data frame needs to be "tidy":

Seaborn 实际上是围绕pandas.DataFrames构建的。但是，您的数据框需要“整洁”：

Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

每个变量形成一列。
每个观察值形成一行。
每种类型的观测单元形成一个表格。

Since you want to plot x, y, and z on the same plot, it seems like they are actually different observations. Thus, you really have three variables: time, value, and the letter used.

由于您想在同一张图上绘制 x、y 和 z，看起来它们实际上是不同的观测值。因此，您确实拥有三个变量：时间、值和使用的字母。

The "tidy" standard comes from Hadly Wickham, who implemented it in the tidyr package.

在“整洁”的标准来自Hadly韦翰，谁在tidyr包中实现它。

First, I convert the index to a Datetime:

首先，我将索引转换为日期时间：

mydf.index = pd.DatetimeIndex(mydf.index)

Then we do the conversion to tidy data:

然后我们转换成整洁的数据：

pivoted = mydf.unstack().reset_index()

and rename the columns

并重命名列

pivoted = pivoted.rename(columns={"level_0": "letter", "level_1": "time", 0: "value"})

Now, this is what our data looks like:

现在，这就是我们的数据的样子：

  letter                time  value
0      x 2019-03-13 12:00:00      1
1      x 2019-03-13 01:00:00      2
2      x 2019-03-13 02:00:00      3
3      x 2019-03-13 03:00:00      4
4      x 2019-03-13 04:00:00      5

Unfortunately, seaborn doesn't play with DateTimes that well, so you can just extract the hour as an integer:

不幸的是，seaborn 不能很好地使用 DateTimes，因此您可以将小时提取为整数：

pivoted["hour"] = pivoted["time"].dt.hour

With a data frame in this form, seaborn takes in the data easily:

使用这种形式的数据框，seaborn 可以轻松获取数据：

import seaborn as sns
sns.set()

sns.scatterplot(data=pivoted, x="hour", y="value", hue="letter")

Outputs:

输出：

pandas 如何获得数据框的简单散点图（最好使用 seaborn）

提问by theQman

回答by Carsten

回答by T.C. Proctor

相关推荐

最近更新

标签

pandas 如何获得数据框的简单散点图（最好使用 seaborn）

提问by theQman

回答by Carsten

回答by T.C. Proctor

相关推荐

pandas python通过列表创建一个带有一行的数据框

Pandas：ValueError - 操作数无法与形状一起广播

pandas 如何将数据集拆分为训练集和验证集，保持类之间的比例？

与 Pandas 一起命名日

相关推荐

最近更新

标签