从 url 下载 csv 并使其成为数据框 python pandas

Question

提问by cloudly lemons

I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.

我是 python 的新手，所以在这里需要一些帮助。我有一个带有链接的 url 列的数据框，允许我为每个链接下载 CSV。我的目标是创建一个循环/任何有效的方法，以便我可以运行一个命令，该命令允许我下载、读取 csv 并为每一行创建一个数据框。任何帮助，将不胜感激。我附上了下面的数据框的一部分。如果链接不起作用（您可能无法将其替换为来自“ https://finance.yahoo.com/quote/GOOG/history?p=GOOG”（任何其他公司）的链接，并且导航到下载 csv 并使用该链接。

Dataframe:

数据框：

Symbol         Link
YI             https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH            https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN           https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS           https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E

Thanks again.

再次感谢。

Answer 1

回答by Prayson W. Daniel

You need a post request and the get the contents to io.

您需要发布请求并将内容发送到 io。

import pandas as pd
import requests
import io

url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
         'period2':1541443929,
         'interval':'1d',
         'events':'history',
         'crumb':'v4z6ZpmoP98',
        }

r = requests.post(url,data=params)
if r.ok:
    data = r.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

To get the params, I just followed the liked and copied everything after ‘?'. Check that they match ;)

为了获得参数，我只是按照喜欢的内容复制了“？”之后的所有内容。检查它们是否匹配；)

Results:

结果：

Update:

更新：

If you can see the raw csv contents directly in url, just pass the url in pd.read_csvExample data directly from url:

如果您可以直接在 url 中看到原始 csv 内容，只需pd.read_csv直接从 url传递示例数据中的 url：

data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'

df = pd.read_csv(data_url)

Answer 2

回答by Azi_bel

I routinely use this procedure

我经常使用这个程序

import pandas as pd
import requests

url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)

Answer 3

回答by HUSMEN

First break down the task to smaller parts, what you need to do is:

首先将任务分解成更小的部分，你需要做的是：

Iterate over the DataFrame with the links.

for index, row in df.iterrows():
    url= row["Link"]

Download JSON file from Yahoo Finance using Python's requestslibrary. This is probably the difficult part, you will need to get cookies before actually downloading the CSV file, more info here,hereand here. Once you create the proper URL with the cookie, you can download it with:
```
re = requests.get(URL)
print(re.status_code) #status code 200 for successful download
```
Optionally, you can save the response to your local disk.

Load it with pandas.

df = pd.read_csv(file_name) #in case of saving file to disk
df = pd.read_csv(re.content) #directly from the response

使用链接遍历 DataFrame。

for index, row in df.iterrows():
    url= row["Link"]

使用 Pythonrequests库从 Yahoo Finance 下载 JSON 文件。这可能是困难的部分，您需要在实际下载 CSV 文件之前获取 cookie，更多信息请点击此处、此处和此处。使用 cookie 创建正确的 URL 后，您可以使用以下命令下载它：
```
re = requests.get(URL)
print(re.status_code) #status code 200 for successful download
```
或者，您可以将响应保存到本地磁盘。

用Pandas加载它。

df = pd.read_csv(file_name) #in case of saving file to disk
df = pd.read_csv(re.content) #directly from the response

Answer 4

回答by Michael Mallon

If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.

如果您将以下内容应用于数据框，它将把每个文档放在一个 np.array 中。不在数据框中（我不确定如何到达那里）。但这将使您可以访问所有文件，只需将它们放入 df 中即可。

links = test['Link'].unique()

import requests
a=[]
for x in links:
     url=x
     s=requests.get(url).content
     a.append(s)

a[4] or np.array(a[4]).tolist()outputs the entire file just in the incorrect format.

a[4] or np.array(a[4]).tolist()以不正确的格式输出整个文件。

Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.

使用 ' https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' 而不是 Yahoo 它更容易访问。

从 url 下载 csv 并使其成为数据框 python pandas

提问by cloudly lemons

回答by Prayson W. Daniel

回答by Azi_bel

回答by HUSMEN

回答by Michael Mallon

相关推荐

最近更新

标签

从 url 下载 csv 并使其成为数据框 python pandas

提问by cloudly lemons

回答by Prayson W. Daniel

回答by Azi_bel

回答by HUSMEN

回答by Michael Mallon

相关推荐

Pandas 数据帧多索引合并

pandas 在熊猫数据框中展平嵌套的 Json

pandas Python：将数据输出到 Excel 电子表格

pandas 用零 python 熊猫填充 nan

相关推荐

最近更新

标签