Python 您可以使用流而不是本地文件上传到 S3 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31031463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can you upload to S3 using a stream rather than a local file?
提问by inquiring minds
I need to create a CSV and upload it to an S3 bucket. Since I'm creating the file on the fly, it would be better if I could write it directly to S3 bucket as it is being created rather than writing the whole file locally, and then uploading the file at the end.
我需要创建一个 CSV 并将其上传到 S3 存储桶。由于我正在即时创建文件,因此最好在创建文件时将其直接写入 S3 存储桶,而不是在本地写入整个文件,然后在最后上传文件。
Is there a way to do this? My project is in Python and I'm fairly new to the language. Here is what I tried so far:
有没有办法做到这一点?我的项目是用 Python 编写的,我对这门语言还很陌生。这是我到目前为止尝试过的:
import csv
import csv
import io
import boto
from boto.s3.key import Key
conn = boto.connect_s3()
bucket = conn.get_bucket('dev-vs')
k = Key(bucket)
k.key = 'foo/foobar'
fieldnames = ['first_name', 'last_name']
writer = csv.DictWriter(io.StringIO(), fieldnames=fieldnames)
k.set_contents_from_stream(writer.writeheader())
I received this error: BotoClientError: s3 does not support chunked transfer
我收到此错误: BotoClientError: s3 不支持分块传输
UPDATE: I found a way to write directly to S3, but I can't find a way to clear the buffer without actually deleting the lines I already wrote. So, for example:
更新:我找到了一种直接写入 S3 的方法,但我找不到一种方法来清除缓冲区而不实际删除我已经编写的行。因此,例如:
conn = boto.connect_s3()
bucket = conn.get_bucket('dev-vs')
k = Key(bucket)
k.key = 'foo/foobar'
testDict = [{
"fieldA": "8",
"fieldB": None,
"fieldC": "888888888888"},
{
"fieldA": "9",
"fieldB": None,
"fieldC": "99999999999"}]
f = io.StringIO()
fieldnames = ['fieldA', 'fieldB', 'fieldC']
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
k.set_contents_from_string(f.getvalue())
for row in testDict:
writer.writerow(row)
k.set_contents_from_string(f.getvalue())
f.close()
Writes 3 lines to the file, however I'm unable to release memory to write a big file. If I add:
将 3 行写入文件,但是我无法释放内存来写入大文件。如果我添加:
f.seek(0)
f.truncate(0)
to the loop, then only the last line of the file is written. Is there any way to release resources without deleting lines from the file?
到循环,然后只写入文件的最后一行。有没有办法释放资源而不从文件中删除行?
回答by inquiring minds
I did find a solution to my question, which I will post here in case anyone else is interested. I decided to do this as parts in a multipart upload. You can't stream to S3. There is also a package available that changes your streaming file over to a multipart upload which I used: Smart Open.
我确实找到了我的问题的解决方案,我会在这里发布以防其他人感兴趣。我决定将其作为分段上传中的一部分来执行。您无法流式传输到 S3。还有一个可用的包可以将您的流文件更改为我使用的分段上传:Smart Open。
import smart_open
import io
import csv
testDict = [{
"fieldA": "8",
"fieldB": None,
"fieldC": "888888888888"},
{
"fieldA": "9",
"fieldB": None,
"fieldC": "99999999999"}]
fieldnames = ['fieldA', 'fieldB', 'fieldC']
f = io.StringIO()
with smart_open.smart_open('s3://dev-test/bar/foo.csv', 'wb') as fout:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
fout.write(f.getvalue())
for row in testDict:
f.seek(0)
f.truncate(0)
writer.writerow(row)
fout.write(f.getvalue())
f.close()
回答by El Ruso
According to docsit's possible
根据文档,这是可能的
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
so we can use StringIO
in ordinary way
所以我们可以用StringIO
普通的方法
Update: smart_openlib from @inquiring minds answer is better solution
更新:@inquiring minds 的 smart_openlib 答案是更好的解决方案
回答by Sam
To write a string to an S3 object, use:
要将字符串写入 S3 对象,请使用:
s3.Object('my_bucket', 'my_file.txt').put('Hello there')
So convert the stream to string and you're there.
所以将流转换为字符串,你就在那里。