pandas 大量数据的散点图

Question

提问by dodo4545

Let's say i've got a large dataset(8500000X50). And i would like to scatter plot X(date) and Y(the measurement that was taken at a certain day).

假设我有一个大数据集（8500000X50）。我想散点图 X（日期）和 Y（在某一天进行的测量）。

I could get only this:

我只能得到这个：

data_X = data['date_local']
data_Y = data['arithmetic_mean']
data_Y = data_Y.round(1)
data_Y = data_Y.astype(int)
data_X = data_X.astype(int)
sns.regplot(data_X, data_Y, data=data)
plt.show()

According to somehow 'same' questions i've found at Stackoverflow, i can shuffle my data or take for example 1000 random values and plot them. But how to implement it in such a manner that every X(date when the certain measurement was taken) will correspond to actual(Y measurement).

根据我在 Stackoverflow 上发现的某种“相同”问题，我可以洗牌我的数据或采用例如 1000 个随机值并绘制它们。但是如何以这样一种方式实现它，即每个 X（进行特定测量的日期）将对应于实际（Y 测量）。

Answer 1

回答by Vinícius Aguiar

First, answering your question:

首先回答你的问题：

You should use pandas.DataFrame.sampleto get a sample from your dateframe, and then use regplot, below is a small example using random data:

您应该使用pandas.DataFrame.sample从日期框中获取样本，然后使用regplot，下面是一个使用随机数据的小示例：

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import numpy as np
import pandas as pd
import seaborn as sns

dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})

dfSample = df.sample(1000) # This is the importante line
xdataSample, ydataSample = dfSample["dates"], dfSample["data"]

sns.regplot(x=mdates.date2num(xdataSample.astype(datetime)), y=ydataSample) 
plt.show()

On regplotI perform a convertion in my X data because of datetime's type, notice this definitely should notbe necessary depending on your data.

在regplot我，因为日期时间的类型我的X数据执行皈依，注意到这个绝对应该不依赖于你的数据是必要的。

So, instead of something like this:

所以，而不是这样的：

You'll get something like this:

你会得到这样的东西：

Now, a suggestion:

现在，一个建议：

Use sns.jointplot, which has a kindparameter, from the docs:

Use sns.jointplot，它有一个kind参数，来自文档：

kind : { “scatter” | “reg” | “resid” | “kde” | “hex” }, optional
Kind of plot to draw.

种类：{“分散”| “注册” | “居住” | “kde” | “十六进制”}，可选
要绘制的情节。

What we create here is a similar of what matplotlib's hist2d does, it creates something like a heatmap, using your entire dataset. An example using random data:

我们在这里创建的内容与 matplotlib 的 hist2d 所做的类似，它使用您的整个数据集创建类似热图的东西。使用随机数据的示例：

dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})

xdata, ydata = df["dates"], df["data"]
sns.jointplot(x=mdates.date2num(xdata.astype(datetime)), y=ydata, kind="kde")

plt.show()

This results in this image, which is also good for seeing the distributions along your desired axis:

这会产生此图像，这也有助于查看沿所需轴的分布：

pandas 大量数据的散点图

提问by dodo4545

回答by Vinícius Aguiar

First, answering your question:

首先回答你的问题：

Now, a suggestion:

现在，一个建议：

相关推荐

最近更新

标签

pandas 大量数据的散点图

提问by dodo4545

回答by Vinícius Aguiar

First, answering your question:

首先回答你的问题：

Now, a suggestion:

现在，一个建议：

相关推荐

Pandas：astype error string to float（无法将字符串转换为浮点数：'7,50'）

Pandas：如何在数据框列中找到特定模式？

与 Pandas 并排的箱线图

Pandas 填充组内缺失的日期和值

相关推荐

最近更新

标签