pandas 使用 python2.7 从 Amazon s3 读取 csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43345907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read csv from Amazon s3 using python2.7
提问by lucy
I can easily get the bucket name from s3 but when I read the csv file from s3, it gives error every time.
我可以轻松地从 s3 获取存储桶名称,但是当我从 s3 读取 csv 文件时,每次都会出错。
import boto3
import pandas as pd
s3 = boto3.client('s3',
aws_access_key_id='yyyyyyyy',
aws_secret_access_key='xxxxxxxxxxx')
# Call S3 to list current buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
print bucket['Name']
output
s3-bucket-data
.
.
import pandas as pd
import StringIO
from boto.s3.connection import S3Connection
AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('s3-bucket-data')
fileName = "data.csv"
content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
getting error-
得到错误-
boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
How I can read the csv from s3?
我如何从 s3 读取 csv?
回答by muon
you can use s3fs
package
你可以使用s3fs
包
s3fsalso supports aws profiles in credential files.
s3fs还支持凭证文件中的 aws 配置文件。
Here is an example (you don't have to chunk it, but i just had this example handy),
这是一个例子(你不必把它分块,但我只是把这个例子放在手边),
import os
import pandas as pd
import s3fs
import gzip
chunksize = 999999
usecols = ["Col1", "Col2"]
filename = 'some_csv_file.csv.gz'
s3_bucket_name = 'some_bucket_name'
AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
s3f = s3fs.S3FileSystem(
anon=False,
key=AWS_KEY,
secret=AWS_SECRET)
# or if you have a profile defined in credentials file:
#aws_shared_credentials_file = 'path/to/aws/credentials/file/'
#os.environ['AWS_SHARED_CREDENTIALS_FILE'] = aws_shared_credentials_file
#s3f = s3fs.S3FileSystem(
# anon=False,
# profile_name=s3_profile)
filepath = os.path.join(s3_bucket_name, filename)
with s3f.open(filepath, 'rb') as f:
gz = gzip.GzipFile(fileobj=f) # Decompress data with gzip
chunks = pd.read_csv(gz,
usecols=usecols,
chunksize=chunksize,
iterator=True,
)
df = pd.concat([c for c in chunks], axis=1)
回答by rrmerugu
boto
is onething I love when it comes to handling data on S3 with python..
boto
在使用 python 处理 S3 上的数据时,这是我喜欢的一件事。
install boto
using pip install boto
安装boto
使用pip install boto
import boto
from boto.s3.key import Key
keyId ="your_aws_key_id"
sKeyId="your_aws_secret_key_id"
srcFileName="abc.txt" # filename on S3
destFileName="s3_abc.txt" # output file name
bucketName="mybucket001" # S3 bucket name
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
#Get the Key object of the given key, in the bucket
k = Key(bucket,srcFileName)
#Get the contents of the key into a file
k.get_contents_to_filename(destFileName)
回答by Manas Gaur
I experienced this issue with a few AWS Regions. I created a bucket in "us-east-1" and the following code worked fine:
我在几个 AWS 区域遇到了这个问题。我在“us-east-1”中创建了一个存储桶,以下代码运行良好:
import boto
from boto.s3.key import Key
import StringIO
import pandas as pd
keyId ="xxxxxxxxxxxxxxxxxx"
sKeyId="yyyyyyyyyyyyyyyyyy"
srcFileName="zzzzz.csv"
bucketName="elasticbeanstalk-us-east-1-aaaaaaaaaaaa"
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
k = Key(bucket,srcFileName)
content = k.get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
Try creating a new bucket in us-east-1 and see if it works.
尝试在 us-east-1 中创建一个新存储桶,看看它是否有效。
回答by sepideh
Try the following:
请尝试以下操作:
import boto3
from boto3 import session
import pandas as pd
import io
session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config =
boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')
dataset = pd.read_csv(io.BytesIO(response['Body'].read()), encoding='utf8')