Python 来自 url 的 Pandas read_csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32400867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:29:44  来源:igfitidea点击:

Pandas read_csv from url

pythoncsvpandasrequest

提问by venom

I am using Python 3.4 with IPython and have the following code. I'm unable to read a csv-file from the given URL:

我在 IPython 中使用 Python 3.4 并具有以下代码。我无法从给定的 URL 读取 csv 文件:

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

I have the following error

我有以下错误

"Expected file path name or file-like object, got type"

“预期的文件路径名或类文件对象,得到类型”

How can I fix this?

我怎样才能解决这个问题?

采纳答案by Anand S Kumar

Update

更新

From pandas 0.19.2you can now just pass the url directly.

从熊猫0.19.2你现在可以直接传递 url



Just as the error suggests, pandas.read_csvneeds a file-like object as the first argument.

正如错误所暗示的那样,pandas.read_csv需要一个类似文件的对象作为第一个参数。

If you want to read the csv from a string, you can use io.StringIO(Python 3.x) or StringIO.StringIO(Python 2.x).

如果要从字符串中读取 csv,可以使用io.StringIO(Python 3.x) 或StringIO.StringIO(Python 2.x)

Also, for the URL - https://github.com/cs109/2014_data/blob/master/countries.csv- you are getting back htmlresponse , not raw csv, you should use the url given by the Rawlink in the github page for getting raw csv response , which is - https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

此外,对于 URL - https://github.com/cs109/2014_data/blob/master/countries.csv- 您得到的是htmlresponse ,而不是原始 csv,您应该使用Rawgithub 页面中的链接给出的 url获取原始 csv 响应,即 - https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Example -

例子 -

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

回答by Padraic Cunningham

As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8")))if using requests, you need to decode as .content returns bytesif you used .text you would just need to pass s as is s = requests.get(url).textc = pd.read_csv(StringIO(s)).

正如我评论的那样,您需要使用 StringIO 对象并解码,即c=pd.read_csv(io.StringIO(s.decode("utf-8")))如果使用请求,则需要解码为 .content 返回字节,如果您使用 .text 您只需要像s = requests.get(url).textc =一样传递 s pd.read_csv(StringIO(s))

A simpler approach is to pass the correct url of the rawdata directly to read_csv, you don'thave to pass a file like object, you can pass a url so you don't need requests at all:

一种更简单的方法是将原始数据的正确 url直接传递给read_csv,您不必传递像对象这样的文件,您可以传递一个 url,因此您根本不需要请求:

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

Output:

输出:

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

From the docs:

文档

filepath_or_buffer:

filepath_or_buffer

string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv

字符串或文件句柄 / StringIO 该字符串可以是一个 URL。有效的 URL 方案包括 http、ftp、s3 和文件。对于文件 URL,需要一个主机。例如,本地文件可以是 file://localhost/path/to/table.csv

回答by PabTorre

The problem you're having is that the output you get into the variable 's' is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:

您遇到的问题是,您进入变量 's' 的输出不是 csv,而是 html 文件。为了获取原始 csv,您必须将 url 修改为:

'https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'

' https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'

Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.

你的第二个问题是 read_csv 需要一个文件名,我们可以通过使用 io 模块中的 StringIO 来解决这个问题。第三个问题是 request.get(url).content 传递的是一个字节流,我们可以使用 request.get(url).text 来解决这个问题。

End result is this code:

最终结果是这个代码:

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

output:

输出:

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

回答by inodb

In the latest version of pandas (0.19.2) you can directly pass the url

在最新版的pandas( 0.19.2)中可以直接传url

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

回答by jain

To Import Data through URL in pandas just apply the simple below code it works actually better.

要通过 Pandas 中的 URL 导入数据,只需应用以下简单的代码,它实际上效果更好。

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

If you are having issues with a raw data then just put 'r' before URL

如果您在处理原始数据时遇到问题,只需在 URL 前加上 'r'

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

回答by Gursimran Singh

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")