Python 您可以使用流而不是本地文件上传到 S3 吗？

Question

提问by inquiring minds

I need to create a CSV and upload it to an S3 bucket. Since I'm creating the file on the fly, it would be better if I could write it directly to S3 bucket as it is being created rather than writing the whole file locally, and then uploading the file at the end.

我需要创建一个 CSV 并将其上传到 S3 存储桶。由于我正在即时创建文件，因此最好在创建文件时将其直接写入 S3 存储桶，而不是在本地写入整个文件，然后在最后上传文件。

Is there a way to do this? My project is in Python and I'm fairly new to the language. Here is what I tried so far:

有没有办法做到这一点？我的项目是用 Python 编写的，我对这门语言还很陌生。这是我到目前为止尝试过的：

import csv
import csv
import io
import boto
from boto.s3.key import Key


conn = boto.connect_s3()
bucket = conn.get_bucket('dev-vs')
k = Key(bucket)
k.key = 'foo/foobar'

fieldnames = ['first_name', 'last_name']
writer = csv.DictWriter(io.StringIO(), fieldnames=fieldnames)
k.set_contents_from_stream(writer.writeheader())

I received this error: BotoClientError: s3 does not support chunked transfer

我收到此错误： BotoClientError: s3 不支持分块传输

UPDATE: I found a way to write directly to S3, but I can't find a way to clear the buffer without actually deleting the lines I already wrote. So, for example:

更新：我找到了一种直接写入 S3 的方法，但我找不到一种方法来清除缓冲区而不实际删除我已经编写的行。因此，例如：

conn = boto.connect_s3()
bucket = conn.get_bucket('dev-vs')
k = Key(bucket)
k.key = 'foo/foobar'

testDict = [{
    "fieldA": "8",
    "fieldB": None,
    "fieldC": "888888888888"},
    {
    "fieldA": "9",
    "fieldB": None,
    "fieldC": "99999999999"}]

f = io.StringIO()
fieldnames = ['fieldA', 'fieldB', 'fieldC']
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
k.set_contents_from_string(f.getvalue())

for row in testDict:
    writer.writerow(row)
    k.set_contents_from_string(f.getvalue())

f.close()

Writes 3 lines to the file, however I'm unable to release memory to write a big file. If I add:

将 3 行写入文件，但是我无法释放内存来写入大文件。如果我添加：

f.seek(0)
f.truncate(0)

to the loop, then only the last line of the file is written. Is there any way to release resources without deleting lines from the file?

到循环，然后只写入文件的最后一行。有没有办法释放资源而不从文件中删除行？

Answer 1

回答by inquiring minds

I did find a solution to my question, which I will post here in case anyone else is interested. I decided to do this as parts in a multipart upload. You can't stream to S3. There is also a package available that changes your streaming file over to a multipart upload which I used: Smart Open.

我确实找到了我的问题的解决方案，我会在这里发布以防其他人感兴趣。我决定将其作为分段上传中的一部分来执行。您无法流式传输到 S3。还有一个可用的包可以将您的流文件更改为我使用的分段上传：Smart Open。

import smart_open
import io
import csv

testDict = [{
    "fieldA": "8",
    "fieldB": None,
    "fieldC": "888888888888"},
    {
    "fieldA": "9",
    "fieldB": None,
    "fieldC": "99999999999"}]

fieldnames = ['fieldA', 'fieldB', 'fieldC']
f = io.StringIO()
with smart_open.smart_open('s3://dev-test/bar/foo.csv', 'wb') as fout:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    fout.write(f.getvalue())

    for row in testDict:
        f.seek(0)
        f.truncate(0)
        writer.writerow(row)
        fout.write(f.getvalue())

f.close()

Answer 2

回答by El Ruso

According to docsit's possible

根据文档，这是可能的

s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))

so we can use StringIOin ordinary way

所以我们可以用StringIO普通的方法

Update: smart_openlib from @inquiring minds answer is better solution

更新：@inquiring minds 的 smart_openlib 答案是更好的解决方案

Answer 3

回答by Sam

To write a string to an S3 object, use:

要将字符串写入 S3 对象，请使用：

s3.Object('my_bucket', 'my_file.txt').put('Hello there')

So convert the stream to string and you're there.

所以将流转换为字符串，你就在那里。

Python 您可以使用流而不是本地文件上传到 S3 吗？

提问by inquiring minds

回答by inquiring minds

回答by El Ruso

回答by Sam

相关推荐

最近更新

标签

Python 您可以使用流而不是本地文件上传到 S3 吗？

提问by inquiring minds

回答by inquiring minds

回答by El Ruso

回答by Sam

相关推荐

python-dev 包用于什么

类中的 Python 递归

如何从pythonflask中的不同目录引用html模板

Python 导入错误：无法导入名称

相关推荐

最近更新

标签