Python 来自 url 的 Pandas read_csv

Question

提问by venom

I am using Python 3.4 with IPython and have the following code. I'm unable to read a csv-file from the given URL:

我在 IPython 中使用 Python 3.4 并具有以下代码。我无法从给定的 URL 读取 csv 文件：

import pandas as pd
import requests

url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)

I have the following error

我有以下错误

"Expected file path name or file-like object, got type"

“预期的文件路径名或类文件对象，得到类型”

How can I fix this?

我怎样才能解决这个问题？

Answer 1

采纳答案by Anand S Kumar

Update

更新

From pandas 0.19.2you can now just pass the url directly.

从熊猫0.19.2你现在可以直接传递 url。

Just as the error suggests, pandas.read_csvneeds a file-like object as the first argument.

正如错误所暗示的那样，pandas.read_csv需要一个类似文件的对象作为第一个参数。

If you want to read the csv from a string, you can use io.StringIO(Python 3.x) or StringIO.StringIO(Python 2.x).

如果要从字符串中读取 csv，可以使用io.StringIO(Python 3.x) 或StringIO.StringIO(Python 2.x)。

Also, for the URL - https://github.com/cs109/2014_data/blob/master/countries.csv- you are getting back htmlresponse , not raw csv, you should use the url given by the Rawlink in the github page for getting raw csv response , which is - https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

此外，对于 URL - https://github.com/cs109/2014_data/blob/master/countries.csv- 您得到的是htmlresponse ，而不是原始 csv，您应该使用Rawgithub 页面中的链接给出的 url获取原始 csv 响应，即 - https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

Example -

例子 -

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

Answer 2

回答by Padraic Cunningham

As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8")))if using requests, you need to decode as .content returns bytesif you used .text you would just need to pass s as is s = requests.get(url).textc = pd.read_csv(StringIO(s)).

正如我评论的那样，您需要使用 StringIO 对象并解码，即c=pd.read_csv(io.StringIO(s.decode("utf-8")))如果使用请求，则需要解码为 .content 返回字节，如果您使用 .text 您只需要像s = requests.get(url).textc =一样传递 s pd.read_csv(StringIO(s))。

A simpler approach is to pass the correct url of the rawdata directly to read_csv, you don'thave to pass a file like object, you can pass a url so you don't need requests at all:

一种更简单的方法是将原始数据的正确 url直接传递给read_csv，您不必传递像对象这样的文件，您可以传递一个 url，因此您根本不需要请求：

c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")

print(c)

Output:

输出：

                              Country         Region
0                             Algeria         AFRICA
1                              Angola         AFRICA
2                               Benin         AFRICA
3                            Botswana         AFRICA
4                             Burkina         AFRICA
5                             Burundi         AFRICA
6                            Cameroon         AFRICA
..................................

From the docs:

从文档：

filepath_or_buffer:

filepath_or_buffer：

string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv

字符串或文件句柄 / StringIO 该字符串可以是一个 URL。有效的 URL 方案包括 http、ftp、s3 和文件。对于文件 URL，需要一个主机。例如，本地文件可以是 file://localhost/path/to/table.csv

Answer 3

回答by PabTorre

The problem you're having is that the output you get into the variable 's' is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:

您遇到的问题是，您进入变量 's' 的输出不是 csv，而是 html 文件。为了获取原始 csv，您必须将 url 修改为：

'https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'

' https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'

Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.

你的第二个问题是 read_csv 需要一个文件名，我们可以通过使用 io 模块中的 StringIO 来解决这个问题。第三个问题是 request.get(url).content 传递的是一个字节流，我们可以使用 request.get(url).text 来解决这个问题。

End result is this code:

最终结果是这个代码：

from io import StringIO

import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text

c=pd.read_csv(StringIO(s))

output:

输出：

>>> c.head()
    Country  Region
0   Algeria  AFRICA
1    Angola  AFRICA
2     Benin  AFRICA
3  Botswana  AFRICA
4   Burkina  AFRICA

Answer 4

回答by inodb

In the latest version of pandas (0.19.2) you can directly pass the url

在最新版的pandas( 0.19.2)中可以直接传url

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

Answer 5

回答by jain

To Import Data through URL in pandas just apply the simple below code it works actually better.

要通过 Pandas 中的 URL 导入数据，只需应用以下简单的代码，它实际上效果更好。

import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()

If you are having issues with a raw data then just put 'r' before URL

如果您在处理原始数据时遇到问题，只需在 URL 前加上 'r'

import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()

Answer 6

回答by Gursimran Singh

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")

Python 来自 url 的 Pandas read_csv

提问by venom

采纳答案by Anand S Kumar

Update

更新

回答by Padraic Cunningham

回答by PabTorre

回答by inodb

回答by jain

To Import Data through URL in pandas just apply the simple below code it works actually better.

要通过 Pandas 中的 URL 导入数据，只需应用以下简单的代码，它实际上效果更好。

If you are having issues with a raw data then just put 'r' before URL

如果您在处理原始数据时遇到问题，只需在 URL 前加上 'r'

回答by Gursimran Singh

相关推荐

最近更新

标签

Python 来自 url 的 Pandas read_csv

提问by venom

采纳答案by Anand S Kumar

Update

更新

回答by Padraic Cunningham

回答by PabTorre

回答by inodb

回答by jain

To Import Data through URL in pandas just apply the simple below code it works actually better.

要通过 Pandas 中的 URL 导入数据，只需应用以下简单的代码，它实际上效果更好。

If you are having issues with a raw data then just put 'r' before URL

如果您在处理原始数据时遇到问题，只需在 URL 前加上 'r'

回答by Gursimran Singh

相关推荐

Python 如何运行康达？

Python 如何将日期时间列舍入到最接近的一刻钟

Python matplotlib.pyplot - 仅修复一个轴限制，将其他设置为自动

Python 类型错误：“类型”对象不可迭代 - 迭代对象实例

相关推荐

最近更新

标签