Python 具有多个系列的 Seaborn 时间序列图

Question

提问by Zhao Li

I'm trying to make a time series plot with seaborn from a dataframe that has multiple series.

我正在尝试从具有多个系列的数据框中使用 seaborn 制作时间序列图。

From this post: seaborn time series from pandas dataframe

来自这篇文章：来自熊猫数据框的seaborn时间序列

I gather that tsplot isn't going to work as it is meant to plot uncertainty.

我认为 tsplot 不会起作用，因为它旨在绘制不确定性。

So is there another Seaborn method that is meant for line charts with multiple series?

那么是否有另一种 Seaborn 方法适用于具有多个系列的折线图？

My dataframe looks like this:

我的数据框如下所示：

print(df.info())
print(df.describe())
print(df.values)
print(df.index)

output:

输出：

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2013-01-03 to 2014-01-03
Data columns (total 5 columns):
Equity(24 [AAPL])      253 non-null float64
Equity(3766 [IBM])     253 non-null float64
Equity(5061 [MSFT])    253 non-null float64
Equity(6683 [SBUX])    253 non-null float64
Equity(8554 [SPY])     253 non-null float64
dtypes: float64(5)
memory usage: 11.9 KB
None
       Equity(24 [AAPL])  Equity(3766 [IBM])  Equity(5061 [MSFT])  \
count         253.000000          253.000000           253.000000   
mean           67.560593          194.075383            32.547436   
std             6.435356           11.175226             3.457613   
min            55.811000          172.820000            26.480000   
25%            62.538000          184.690000            28.680000   
50%            65.877000          193.880000            33.030000   
75%            72.299000          203.490000            34.990000   
max            81.463000          215.780000            38.970000   

       Equity(6683 [SBUX])  Equity(8554 [SPY])  
count           253.000000          253.000000  
mean             33.773277          164.690180  
std               4.597291           10.038221  
min              26.610000          145.540000  
25%              29.085000          156.130000  
50%              33.650000          165.310000  
75%              38.280000          170.310000  
max              40.995000          184.560000  
[[  77.484  195.24    27.28    27.685  145.77 ]
 [  75.289  193.989   26.76    27.85   146.38 ]
 [  74.854  193.2     26.71    27.875  145.965]
 ..., 
 [  80.167  187.51    37.43    39.195  184.56 ]
 [  79.034  185.52    37.145   38.595  182.95 ]
 [  77.284  186.66    36.92    38.475  182.8  ]]
DatetimeIndex(['2013-01-03', '2013-01-04', '2013-01-07', '2013-01-08',
               '2013-01-09', '2013-01-10', '2013-01-11', '2013-01-14',
               '2013-01-15', '2013-01-16', 
               ...
               '2013-12-19', '2013-12-20', '2013-12-23', '2013-12-24',
               '2013-12-26', '2013-12-27', '2013-12-30', '2013-12-31',
               '2014-01-02', '2014-01-03'],
              dtype='datetime64[ns]', length=253, freq=None, tz='UTC')

This works (but I want to get my hands dirty with Seaborn):

这有效（但我想用 Seaborn 弄脏我的手）：

df.plot()

Output:

输出：

Thank you for your time!

感谢您的时间！

Update1:

更新1：

df.to_dict()returned: https://gist.github.com/anonymous/2bdc1ce0f9d0b6ccd6675ab4f7313a5f

df.to_dict()返回：https: //gist.github.com/anonymous/2bdc1ce0f9d0b6ccd6675ab4f7313a5f

Update2:

更新2：

Using @knagaev sample code, I've narrowed it down to this difference:

使用@knagaev 示例代码，我将范围缩小到这种差异：

current dataframe (output of print(current_df)):

当前数据帧（输出print(current_df)）：

                           Equity(24 [AAPL])  Equity(3766 [IBM])  \
2013-01-03 00:00:00+00:00             77.484            195.2400   
2013-01-04 00:00:00+00:00             75.289            193.9890   
2013-01-07 00:00:00+00:00             74.854            193.2000   
2013-01-08 00:00:00+00:00             75.029            192.8200   
2013-01-09 00:00:00+00:00             73.873            192.3800

desired dataframe (output of print(desired_df)):

所需的数据帧（输出print(desired_df)）：

           Date Company       Kind            Price
0    2014-01-02     IBM       Open       187.210007
1    2014-01-02     IBM       High       187.399994
2    2014-01-02     IBM        Low       185.199997
3    2014-01-02     IBM      Close       185.529999
4    2014-01-02     IBM     Volume   4546500.000000
5    2014-01-02     IBM  Adj Close       171.971090
6    2014-01-02    MSFT       Open        37.349998
7    2014-01-02    MSFT       High        37.400002
8    2014-01-02    MSFT        Low        37.099998
9    2014-01-02    MSFT      Close        37.160000
10   2014-01-02    MSFT     Volume  30632200.000000
11   2014-01-02    MSFT  Adj Close        34.960000
12   2014-01-02    ORCL       Open        37.779999
13   2014-01-02    ORCL       High        38.029999
14   2014-01-02    ORCL        Low        37.549999
15   2014-01-02    ORCL      Close        37.840000
16   2014-01-02    ORCL     Volume  18162100.000000

What's the best way to reorganize the current_dfto desired_df?

重组current_dfto的最佳方法是desired_df什么？

Update 3: I finally got it working from the help of @knagaev:

更新 3：我终于在 @knagaev 的帮助下让它工作了：

I had to add a dummy column as well as finesse the index:

我不得不添加一个虚拟列并优化索引：

df['Datetime'] = df.index
melted_df = pd.melt(df, id_vars='Datetime', var_name='Security', value_name='Price')
melted_df['Dummy'] = 0

sns.tsplot(melted_df, time='Datetime', unit='Dummy', condition='Security', value='Price', ax=ax)

to produce:

生产：

Answer 1

采纳答案by knagaev

You can try to get hands dirty with tsplot.

您可以尝试使用tsplot弄脏手。

You will draw your line charts with standard errors ("statistical additions")

您将绘制带有标准误差的折线图（“统计添加”）

I tried to simulate your dataset. So here is the results

我试图模拟你的数据集。所以这是结果

import pandas.io.data as web
from datetime import datetime
import seaborn as sns

stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2014,1,1)
end = datetime(2014,3,28)    
f = web.DataReader(stocks, 'yahoo',start,end)

df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']

sns.tsplot(df, time='Date', unit='Kind', condition='Company', value='Price')

By the way this sample is very imitative. The parameter "unit" is "Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. " (from documentation). So I used the 'Kind' field for illustrative purposes.

顺便说一下，这个样本非常具有模仿性。参数“unit”是“数据DataFrame中标识采样单元（例如主体、神经元等）的字段。错误表示将在每次/条件观察时在单元上折叠。”（来自文档）。因此，我使用“种类”字段进行说明。

Ok, I made an example for your dataframe. It has dummy field for "noise cleaning" :)

好的，我为你的数据框做了一个例子。它具有用于“噪音清理”的虚拟字段:)

import pandas.io.data as web
from datetime import datetime
import seaborn as sns

stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2010,1,1)
end = datetime(2015,12,31)    
f = web.DataReader(stocks, 'yahoo',start,end)

df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']

df_open = df[df['Kind'] == 'Open'].copy()
df_open['Dummy'] = 0

sns.tsplot(df_open, time='Date', unit='Dummy', condition='Company', value='Price')

P.S. Thanks to @VanPeer - now you can use seaborn.lineplotfor this problem

PS 感谢@VanPeer - 现在你可以使用seaborn.lineplot来解决这个问题

Python 具有多个系列的 Seaborn 时间序列图

提问by Zhao Li

采纳答案by knagaev

相关推荐

最近更新

标签

Python 具有多个系列的 Seaborn 时间序列图

提问by Zhao Li

采纳答案by knagaev

相关推荐

Python 没有名为 StringIO 的模块

Python中字符串连接的时间复杂度

Python 如何使用 Pandas DF 绘制计数条形图，按一个分类列分组并按另一个分类着色

Python 如何使用 SQLAlchemy 仅选择一列？

相关推荐

最近更新

标签