pandas 有没有办法在 ggplot 中绘制熊猫系列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23541497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:01:15  来源:igfitidea点击:

Is there a way to plot a pandas series in ggplot?

pythonpandaspython-ggplot

提问by zerovector

I'm experimenting with pandas and non-matplotlib plotting. Good suggestions are here. This question regards yhat's ggplotand I am running into two issues. Plotting a series in pandas is easy.

我正在试验Pandas和非 matplotlib 绘图。好的建议在这里。这个问题关于yhat 的 ggplot,我遇到了两个问题。在 Pandas 中绘制系列很容易。

frequ.plot()

I don't see how to do this in the ggplot docs. Instead I end up creating a dataframe:

我在 ggplot 文档中看不到如何执行此操作。相反,我最终创建了一个数据框:

cheese = DataFrame({'time': frequ.index, 'count' : frequ.values})
ggplot(cheese, aes(x='time', y='count')) + geom_line()

I would expect ggplot -- a project that has "tight integration with pandas" -- to have a way to plot a simple series.

我希望 ggplot——一个“与Pandas紧密集成”的项目——有一种方法来绘制一个简单的系列。

Second issue is I can't get stat_smooth() to display when the x axis is time of day. Seems like it could be related to this post, but I don't have the rep to post there. My code is:

第二个问题是当 x 轴是一天中的时间时,我无法显示 stat_smooth()。似乎它可能与这篇文章有关,但我没有在那里发布的代表。我的代码是:

frequ = values.sampler.resample("1Min", how="count")
cheese = DataFrame({'time': frequ.index, 'count' : frequ.values})
ggplot(cheese, aes(x='time', y='count')) + geom_line() + stat_smooth()

Any help regarding non-matplotlib plotting would be appreciated. Thanks! (I'm using ggplot 0.5.8)

任何有关非 matplotlib 绘图的帮助将不胜感激。谢谢!(我正在使用 ggplot 0.5.8)

回答by BCR

I run into this problem frequently in Python's ggplot when working with multiple stock prices and economic timeseries. The key to remember with ggplot is that data is best organized in long format to avoid any issues. I use a quick two step process as a workaround. First let's grab some stock data:

在处理多个股票价格和经济时间序列时,我经常在 Python 的 ggplot 中遇到这个问题。使用 ggplot 时要记住的关键是数据最好以长格式组织,以避免出现任何问题。我使用快速的两步过程作为解决方法。首先让我们获取一些股票数据:

import pandas.io.data as web
import pandas as pd
import time
from ggplot import *

stocks = [ 'GOOG', 'MSFT', 'LNKD', 'YHOO', 'FB', 'GOOGL','HPQ','AMZN'] # stock list

# get stock price function #
def get_px(stock, start, end):
    return web.get_data_yahoo(stock, start, end)['Adj Close']

# dataframe of equity prices   
px = pd.DataFrame({n: get_px(n, '1/1/2014', date_today) for n in stocks})

px.head()
              AMZN     FB  GOOG   GOOGL    HPQ    LNKD   MSFT   YHOO
Date                                                                
2014-01-02  397.97  54.71   NaN  557.12  27.40  207.64  36.63  39.59
2014-01-03  396.44  54.56   NaN  553.05  28.07  207.42  36.38  40.12
2014-01-06  393.63  57.20   NaN  559.22  28.02  203.92  35.61  39.93
2014-01-07  398.03  57.92   NaN  570.00  27.91  209.64  35.89  40.92
2014-01-08  401.92  58.23   NaN  571.19  27.19  209.06  35.25  41.02

First understand that ggplot needs the datetime index to be a column in the pandas dataframe in order to plot correctly when switching from wide to long format. I wrote a function to address this particular point. It simply creates a 'Date' column of type=datetime from the pandas series index.

首先了解 ggplot 需要日期时间索引作为 Pandas 数据框中的一列,以便在从宽格式切换到长格式时正确绘图。我写了一个函数来解决这个特殊的问题。它只是从Pandas系列索引创建一个 type=datetime 的“日期”列。

def dateConvert(df):
  df['Date'] = df.index
  df.reset_index(drop=True)
  return df

From there run the function on the df. Use the result as the object in pandas pd.melt using the 'Date' as the id_vars. The returned df is now ready to be plotted using the standard ggplot() format.

从那里在 df 上运行该函数。将结果用作 pandas pd.melt 中的对象,使用 'Date' 作为 id_vars。现在可以使用标准的 ggplot() 格式绘制返回的 df。

px_returns = px.pct_change() # common stock transformation
cumRet = (1+px_returns).cumprod() - 1 # transform daily returns to cumulative 
cumRet_dateConverted = dateConvert(cumRet) # run the function here see the result below#

cumRet_dateConverted.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 118 entries, 2014-01-02 00:00:00 to 2014-06-20 00:00:00
Data columns (total 9 columns):
AMZN     117 non-null float64
FB       117 non-null float64
GOOG     59 non-null float64
GOOGL    117 non-null float64
HPQ      117 non-null float64
LNKD     117 non-null float64
MSFT     117 non-null float64
YHOO     117 non-null float64
Date     118 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(8)


data = pd.melt(cumRet_dateConverted, id_vars='Date').dropna() # Here is the method I use to format the data in the long format. Please note the use of 'Date' as the id_vars.

data = data.rename(columns = {'Date':'Date','variable':'Stocks','value':'Returns'}) # common to rename these columns

From here you can now plot your data however you want. A common plot I use is the following:

从这里您现在可以根据需要绘制数据。我使用的一个常见情节如下:

retPlot_YTD = ggplot(data, aes('Date','Returns',color='Stocks')) \
+ geom_line(size=2.) \
+ geom_hline(yintercept=0, color='black', size=1.7, linetype='-.') \
+ scale_y_continuous(labels='percent') \
+ scale_x_date(labels='%b %d %y',breaks=date_breaks('week') ) \
+ theme_seaborn(style='whitegrid') \
+ ggtitle(('%s Cumulative Daily Return vs Peers_YTD') % key_Stock) 

fig = retPlot_YTD.draw()
ax = fig.axes[0]
offbox = ax.artists[0]
offbox.set_bbox_to_anchor((1, 0.5), ax.transAxes)
fig.show()

FB cumRet plot using ggplot

使用 ggplot 的 FB cumRet 图

回答by Greg

This is more of a workaround but you can use qplotfor quick, shorthand plots using series.

这更像是一种解决方法,但您可以使用qplot系列来快速、速记绘图。

from ggplot import *
qplot(meat.beef)