从 url 下载 csv 并使其成为数据框 python pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53158452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Download a csv from url and make it a dataframe python pandas
提问by cloudly lemons
I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.
我是 python 的新手,所以在这里需要一些帮助。我有一个带有链接的 url 列的数据框,允许我为每个链接下载 CSV。我的目标是创建一个循环/任何有效的方法,以便我可以运行一个命令,该命令允许我下载、读取 csv 并为每一行创建一个数据框。任何帮助,将不胜感激。我附上了下面的数据框的一部分。如果链接不起作用(您可能无法将其替换为来自“ https://finance.yahoo.com/quote/GOOG/history?p=GOOG”(任何其他公司)的链接,并且导航到下载 csv 并使用该链接。
Dataframe:
数据框:
Symbol Link
YI https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
Thanks again.
再次感谢。
回答by Prayson W. Daniel
You need a post request and the get the contents to io.
您需要发布请求并将内容发送到 io。
import pandas as pd
import requests
import io
url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
'period2':1541443929,
'interval':'1d',
'events':'history',
'crumb':'v4z6ZpmoP98',
}
r = requests.post(url,data=params)
if r.ok:
data = r.content.decode('utf8')
df = pd.read_csv(io.StringIO(data))
To get the params, I just followed the liked and copied everything after ‘?'. Check that they match ;)
为了获得参数,我只是按照喜欢的内容复制了“?”之后的所有内容。检查它们是否匹配;)
Update:
更新:
If you can see the raw csv contents directly in url, just pass the url in pd.read_csv
Example data directly from url:
如果您可以直接在 url 中看到原始 csv 内容,只需pd.read_csv
直接从 url传递示例数据中的 url:
data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'
df = pd.read_csv(data_url)
回答by Azi_bel
I routinely use this procedure
我经常使用这个程序
import pandas as pd
import requests
url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)
回答by HUSMEN
First break down the task to smaller parts, what you need to do is:
首先将任务分解成更小的部分,你需要做的是:
Iterate over the DataFrame with the links.
for index, row in df.iterrows(): url= row["Link"]
Download JSON file from Yahoo Finance using Python's
requests
library. This is probably the difficult part, you will need to get cookies before actually downloading the CSV file, more info here,hereand here. Once you create the proper URL with the cookie, you can download it with:re = requests.get(URL) print(re.status_code) #status code 200 for successful download
- Optionally, you can save the response to your local disk.
Load it with pandas.
df = pd.read_csv(file_name) #in case of saving file to disk df = pd.read_csv(re.content) #directly from the response
使用链接遍历 DataFrame。
for index, row in df.iterrows(): url= row["Link"]
使用 Python
requests
库从 Yahoo Finance 下载 JSON 文件。这可能是困难的部分,您需要在实际下载 CSV 文件之前获取 cookie,更多信息请点击此处、此处和此处。使用 cookie 创建正确的 URL 后,您可以使用以下命令下载它:re = requests.get(URL) print(re.status_code) #status code 200 for successful download
- 或者,您可以将响应保存到本地磁盘。
用Pandas加载它。
df = pd.read_csv(file_name) #in case of saving file to disk df = pd.read_csv(re.content) #directly from the response
回答by Michael Mallon
If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.
如果您将以下内容应用于数据框,它将把每个文档放在一个 np.array 中。不在数据框中(我不确定如何到达那里)。但这将使您可以访问所有文件,只需将它们放入 df 中即可。
links = test['Link'].unique()
import requests
a=[]
for x in links:
url=x
s=requests.get(url).content
a.append(s)
a[4] or np.array(a[4]).tolist()
outputs the entire file just in the incorrect format.
a[4] or np.array(a[4]).tolist()
以不正确的格式输出整个文件。
Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.
使用 ' https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' 而不是 Yahoo 它更容易访问。