使用python请求下载CSV

Question

提问by viviwill

Here is my code:

这是我的代码：

import csv
import requests
with requests.Session() as s:
    s.post(url, data=payload)
    download = s.get('url that directly download a csv report')

This gives me the access to the csv file. I tried different method to deal with the download:

这使我可以访问 csv 文件。我尝试了不同的方法来处理下载：

This will give the the csv file in one string:

这将在一个字符串中提供 csv 文件：

print download.content

This print the first row and return error: _csv.Error: new-line character seen in unquoted field

这将打印第一行并返回错误：_csv.Error: new-line character seen in unquoted field

cr = csv.reader(download, dialect=csv.excel_tab)
for row in cr:
    print row

This will print a letter in each row and it won't print the whole thing:

这将在每一行打印一个字母，但不会打印整个内容：

cr = csv.reader(download.content, dialect=csv.excel_tab)
for row in cr:
    print row

My question is: what's the most efficient way to read a csv file in this situation. And how to download it.

我的问题是：在这种情况下读取 csv 文件的最有效方法是什么。以及如何下载。

thanks

谢谢

Answer 1

采纳答案by HEADLESS_0NE

This should help:

这应该有帮助：

import csv
import requests

CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'


with requests.Session() as s:
    download = s.get(CSV_URL)

    decoded_content = download.content.decode('utf-8')

    cr = csv.reader(decoded_content.splitlines(), delimiter=',')
    my_list = list(cr)
    for row in my_list:
        print(row)

Ouput sample:

输出样本：

['street', 'city', 'zip', 'state', 'beds', 'baths', 'sq__ft', 'type', 'sale_date', 'price', 'latitude', 'longitude']
['3526 HIGH ST', 'SACRAMENTO', '95838', 'CA', '2', '1', '836', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '59222', '38.631913', '-121.434879']
['51 OMAHA CT', 'SACRAMENTO', '95823', 'CA', '3', '1', '1167', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68212', '38.478902', '-121.431028']
['2796 BRANCH ST', 'SACRAMENTO', '95815', 'CA', '2', '1', '796', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68880', '38.618305', '-121.443839']
['2805 JANETTE WAY', 'SACRAMENTO', '95815', 'CA', '2', '1', '852', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '69307', '38.616835', '-121.439146']
[...]

Related question with answer: https://stackoverflow.com/a/33079644/295246

相关问题与答案：https: //stackoverflow.com/a/33079644/295246

Edit: Other answers are useful if you need to download large files (i.e. stream=True).

编辑：如果您需要下载大文件（即stream=True），其他答案很有用。

Answer 2

回答by Ares Ou

From a little search, that I understand the file should be opened in universal newline mode, which you cannot directly do with a response content (I guess).

通过一点点搜索，我了解到该文件应该以通用换行符模式打开，您不能直接使用响应内容（我猜）。

To finish the task, you can either save the downloaded content to a temporary file, or process it in memory.

要完成任务，您可以将下载的内容保存到临时文件中，也可以在内存中进行处理。

Save as file:

另存为文件：

import requests
import csv
import os

temp_file_name = 'temp_csv.csv'
url = 'http://url.to/file.csv'
download = requests.get(url)

with open(temp_file_name, 'w') as temp_file:
    temp_file.writelines(download.content)

with open(temp_file_name, 'rU') as temp_file:
    csv_reader = csv.reader(temp_file, dialect=csv.excel_tab)
    for line in csv_reader:
        print line

# delete the temp file after process
os.remove(temp_file_name)

In memory:

在记忆中：

(To be updated)

（要被更新）

Answer 3

回答by aheld

You can update the accepted answer with the iter_lines method of requests if the file is very large

如果文件非常大，您可以使用请求的 iter_lines 方法更新接受的答案

import csv
import requests

CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'

with requests.Session() as s:
    download = s.get(CSV_URL)

    line_iterator = (x.decode('utf-8') for x in download.iter_lines(decode_unicode=True))

    cr = csv.reader(line_iterator, delimiter=',')
    my_list = list(cr)
    for row in my_list:
        print(row)

Answer 4

回答by The Aelfinn

To simplify these answers, and increase performance when downloading a large file, the below may work a bit more efficiently.

为了简化这些答案，并在下载大文件时提高性能，以下可能会更有效地工作。

import requests
from contextlib import closing
import csv

url = "http://download-and-process-csv-efficiently/python.csv"

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
    for row in reader:
        print row

By setting stream=Truein the GET request, when we pass r.iter_lines()to csv.reader(), we are passing a generatorto csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader.

通过stream=True在 GET 请求中设置，当我们传递r.iter_lines()给 csv.reader() 时，我们将一个生成器传递给 csv.reader()。通过这样做，我们使 csv.reader() 能够懒惰地迭代响应中的每一行for row in reader。

This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.

这避免了在我们开始处理之前将整个文件加载到内存中，从而大大减少了大文件的内存开销。

Answer 5

回答by Antti Haapala

You can also use the DictReaderto iterate dictionaries of {'columnname': 'value', ...}

您还可以使用DictReader来迭代字典{'columnname': 'value', ...}

import csv
import requests

response = requests.get('http://example.test/foo.csv')
reader = csv.DictReader(response.iter_lines())
for record in reader:
    print(record)

Answer 6

回答by wescpy

I like the answers from The Aelfinnand aheld. I can improve them only by shortening a bit more, removing superfluous pieces, using a real data source, making it 2.x & 3.x-compatible, and maintaining the high-level of memory-efficiency seen elsewhere:

我喜欢The Aelfinn和aheld的答案。我只能通过缩短一点、删除多余的部分、使用真实的数据源、使其与 2.x 和 3.x 兼容并保持其他地方看到的高水平内存效率来改进它们：

import csv
import requests

CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'

with requests.get(CSV_URL, stream=True) as r:
    lines = (line.decode('utf-8') for line in r.iter_lines())
    for row in csv.reader(lines):
        print(row)

Too bad 3.x is less flexible CSV-wise because the iterator must emit Unicode strings (while requestsdoes bytes) because the 2.x-only version—for row in csv.reader(r.iter_lines()):—is more Pythonic (shorter and easier-to-read). Anyhow, note the 2.x/3.x solution above won't handle the situation described by the OP where a NEWLINE is found unquoted in the data read.

太糟糕了 3.x 在 CSV 方面不太灵活，因为迭代器必须发出 Unicode 字符串（而requests确实bytes），因为 2.x-only 版本 - for row in csv.reader(r.iter_lines()):- 更 Pythonic（更短且更易于阅读）。无论如何，请注意上面的 2.x/3.x 解决方案不会处理 OP 描述的情况，即在读取的数据中发现 NEWLINE 未加引号。

For the part of the OP's question regarding downloading(vs. processing) the actual CSV file, here's another script that does that, 2.x & 3.x-compatible, minimal, readable, and memory-efficient:

对于 OP 关于下载（与处理）实际 CSV 文件的部分问题，这里是另一个脚本，它与2.x 和 3.x 兼容、最小、可读且内存高效：

import os
import requests

CSV_URL = 'http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'

with open(os.path.split(CSV_URL)[1], 'wb') as f, \
        requests.get(CSV_URL, stream=True) as r:
    for line in r.iter_lines():
        f.write(line)

Answer 7

回答by aamir23

The following approach worked well for me. I also did not need to use csv.reader()or csv.writer()functions, which I feel makes the code cleaner. The code is compatible with Python2 and Python 3.

以下方法对我来说效果很好。我也不需要使用csv.reader()或csv.writer()函数，我觉得这使代码更清晰。该代码与 Python2 和 Python 3 兼容。

from six.moves import urllib

DOWNLOAD_URL = "https://raw.githubusercontent.com/gjreda/gregreda.com/master/content/notebooks/data/city-of-chicago-salaries.csv"
DOWNLOAD_PATH ="datasets\city-of-chicago-salaries.csv" 
urllib.request.urlretrieve(URL,DOWNLOAD_PATH)

Note - six is a package that helps in writing code that is compatible with both Python 2 and Python 3. For additional details regarding six see - What does from six.moves import urllibdo in Python?

注-六是包，帮助编写代码，并与双方的Python 2和Python 3兼容有关6只看到更多的细节-什么是from six.moves import urllib在Python呢？

Answer 8

回答by Michal Skop

I use this code (I use Python 3):

我使用此代码（我使用 Python 3）：

import csv
import io
import requests

url = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
r = requests.get(url)
r.encoding = 'utf-8'  # useful if encoding is not sent (or not sent properly) by the server
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
    data.append(row)

使用python请求下载CSV

提问by viviwill

采纳答案by HEADLESS_0NE

回答by Ares Ou

回答by aheld

回答by The Aelfinn

回答by Antti Haapala

回答by wescpy

回答by aamir23

回答by Michal Skop

相关推荐

最近更新

标签

使用python请求下载CSV

提问by viviwill

采纳答案by HEADLESS_0NE

回答by Ares Ou

回答by aheld

回答by The Aelfinn

回答by Antti Haapala

回答by wescpy

回答by aamir23

回答by Michal Skop

相关推荐

Python “冻结”张量流中的一些变量/范围：stop_gradient vs 传递变量以最小化

Python 从 Django DateTimeField 获取日期

如何使用 Python 'in' 运算符检查我的列表/元组是否包含整数 0、1、2？

Python seaborn 中同一图上的多个图

相关推荐

最近更新

标签