pandas 大量数据的散点图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45092124/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scatter plot on large amount of data
提问by dodo4545
Let's say i've got a large dataset(8500000X50). And i would like to scatter plot X(date) and Y(the measurement that was taken at a certain day).
假设我有一个大数据集(8500000X50)。我想散点图 X(日期)和 Y(在某一天进行的测量)。
data_X = data['date_local']
data_Y = data['arithmetic_mean']
data_Y = data_Y.round(1)
data_Y = data_Y.astype(int)
data_X = data_X.astype(int)
sns.regplot(data_X, data_Y, data=data)
plt.show()
According to somehow 'same' questions i've found at Stackoverflow, i can shuffle my data or take for example 1000 random values and plot them. But how to implement it in such a manner that every X(date when the certain measurement was taken) will correspond to actual(Y measurement).
根据我在 Stackoverflow 上发现的某种“相同”问题,我可以洗牌我的数据或采用例如 1000 个随机值并绘制它们。但是如何以这样一种方式实现它,即每个 X(进行特定测量的日期)将对应于实际(Y 测量)。
回答by Vinícius Aguiar
First, answering your question:
首先回答你的问题:
You should use pandas.DataFrame.sample
to get a sample from your dateframe, and then use regplot
, below is a small example using random data:
您应该使用pandas.DataFrame.sample
从日期框中获取样本,然后使用regplot
,下面是一个使用随机数据的小示例:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import numpy as np
import pandas as pd
import seaborn as sns
dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})
dfSample = df.sample(1000) # This is the importante line
xdataSample, ydataSample = dfSample["dates"], dfSample["data"]
sns.regplot(x=mdates.date2num(xdataSample.astype(datetime)), y=ydataSample)
plt.show()
On regplot
I perform a convertion in my X data because of datetime's type, notice this definitely should notbe necessary depending on your data.
在regplot
我,因为日期时间的类型我的X数据执行皈依,注意到这个绝对应该不依赖于你的数据是必要的。
So, instead of something like this:
所以,而不是这样的:
You'll get something like this:
你会得到这样的东西:
Now, a suggestion:
现在,一个建议:
Use sns.jointplot
, which has a kind
parameter, from the docs:
Use sns.jointplot
,它有一个kind
参数,来自文档:
kind : { “scatter” | “reg” | “resid” | “kde” | “hex” }, optional
Kind of plot to draw.
种类:{“分散”| “注册” | “居住” | “kde” | “十六进制”},可选
要绘制的情节。
What we create here is a similar of what matplotlib's hist2d does, it creates something like a heatmap, using your entire dataset. An example using random data:
我们在这里创建的内容与 matplotlib 的 hist2d 所做的类似,它使用您的整个数据集创建类似热图的东西。使用随机数据的示例:
dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})
xdata, ydata = df["dates"], df["data"]
sns.jointplot(x=mdates.date2num(xdata.astype(datetime)), y=ydata, kind="kde")
plt.show()
This results in this image, which is also good for seeing the distributions along your desired axis:
这会产生此图像,这也有助于查看沿所需轴的分布: