Python pandas datareader 不再适用于 yahoo-finance 更改的 url

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44045158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:40:27  来源:igfitidea点击:

Python pandas datareader no longer works for yahoo-finance changed url

pythonpandasyahoo-financepandas-datareader

提问by Scilear

Since yahoo discontinued their API support pandas datareader now fails

由于雅虎停止了他们的 API 支持,pandas datareader 现在失败了

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 5, 17)
web.DataReader('GOOGL', 'yahoo', start, end)

HTTPError: HTTP Error 401: Unauthorized

is there any unofficial library allowing us to temporarily work around the problem? Anything on Quandl maybe?

是否有任何非官方图书馆允许我们暂时解决这个问题?Quandl 上有什么吗?

采纳答案by Scilear

So they've changed their url and now use cookies protection (and possibly javascript) so I fixed my own problem using dryscrape, which emulates a browser this is just an FYI as this surely now breaks their terms and conditions... so use at your own risk? I'm looking at Quandl for an alternative EOD price source.

所以他们改变了他们的 url,现在使用 cookie 保护(可能还有 javascript),所以我使用dryscrape 解决了我自己的问题,它模拟了一个浏览器,这只是一个仅供参考,因为这肯定违反了他们的条款和条件......所以使用你自己的风险?我正在寻找 Quandl 寻找替代 EOD 价格来源。

I could not get anywhere with cookie browsing a CookieJar so I ended up using dryscrape to "fake" a user download

我无法通过 cookie 浏览 CookieJar 到任何地方,所以我最终使用 dryscrape 来“伪造”用户下载

import dryscrape
from bs4 import BeautifulSoup
import time
import datetime
import re

#we visit the main page to initialise sessions and cookies
session = dryscrape.Session()
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95     Safari/537.36')    

#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise...
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")
response = session.body()


#get the dowload link
soup = BeautifulSoup(response, 'lxml')
for taga in soup.findAll('a'):
    if taga.has_attr('download'):
        url_download = taga['href']
print(url_download)

#now replace the default end date end start date that yahoo provides
s = "2017-02-18"
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
e = "2017-05-18"
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple())

#now we replace the period download by our dates, please feel free to improve, I suck at regex
m = re.search('period1=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period1)        
m = re.search('period2=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period2)

#and now viti and get body and you have your csv
session.visit(url_download)
csv_data = session.body()

#and finally if you want to get a dataframe from it
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True)
df

回答by artDeco

I found the workaround by "fix-yahoo-finance" in https://pypi.python.org/pypi/fix-yahoo-financeuseful, for example:

我发现https://pypi.python.org/pypi/fix-yahoo-finance 中的“fix-yahoo-finance”的解决方法很有用,例如:

from pandas_datareader import data as pdr
import fix_yahoo_finance

data = pdr.get_data_yahoo('APPL', start='2017-04-23', end='2017-05-24')

Note the order of the last 2 data columns are 'Adj Close' and 'Volume' ie. not the previous format. To re-index:

请注意,最后 2 个数据列的顺序是“Adj Close”和“Volume”,即。不是以前的格式。重新索引:

cols = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
data.reindex(columns=cols)

回答by Alex L.

I changed from Yahoo to Google Finance and it works for me, so from

我从雅虎改为谷歌财经,它对我有用,所以从

data.DataReader(ticker, 'yahoo', start_date, end_date)

to

data.DataReader(ticker, 'google', start_date, end_date)

and adapted my "old" Yahoo! symbols from:

并改编了我的“旧”雅虎!符号来自:

tickers = ['AAPL','MSFT','GE','IBM','AA','DAL','UAL', 'PEP', 'KO']

to

tickers = ['NASDAQ:AAPL','NASDAQ:MSFT','NYSE:GE','NYSE:IBM','NYSE:AA','NYSE:DAL','NYSE:UAL', 'NYSE:PEP', 'NYSE:KO']

回答by vibhu_singh

Try this out:

试试这个:

import fix_yahoo_finance as yf
data = yf.download('SPY', start = '2012-01-01', end='2017-01-01')

回答by Bora Savkar

Yahoo finance works well with pandas. Use it like this:

雅虎财经与熊猫合作得很好。像这样使用它:

import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import data as wb

ticker='GOOGL'
start_date='2019-1-1'
data_source='yahoo'

ticker_data=wb.DataReader(ticker,data_source=data_source,start=start_date)
df=pd.DataFrame(ticker_data)

回答by Kamaldeep Singh

The name of the fix_yahoo_finance package has been changed to yfinance. So you can try this code

fix_yahoo_finance 包的名称已更改为 yfinance。所以你可以试试这个代码

import yfinance as yf
data = yf.download('MSFT', start = '2012-01-01', end='2017-01-01')

回答by Dipen Lama

Make the thread sleep in between reading after each data. May work most of the time, so try 5-6 times and save the data in the csv file, so next time u can read from file.

使线程在读取每个数据后休眠。可能大部分时间都可以工作,所以尝试 5-6 次并将数据保存在 csv 文件中,以便下次您可以从文件中读取。

### code is here ###
import pandas_datareader as web
import time
import datetime as dt
import pandas as pd

symbols = ['AAPL', 'MSFT', 'AABA', 'DB', 'GLD']
webData = pd.DataFrame()
for stockSymbol in symbols:
    webData[stockSymbol] = web.DataReader(stockSymbol, 
    data_source='yahoo',start= 
               startDate, end= endDate, retry_count= 10)['Adj Close']   
    time.sleep(22) # thread sleep for 22 seconds.