pandas 使用 python2.7 从 Amazon s3 读取 csv

Question

提问by lucy

I can easily get the bucket name from s3 but when I read the csv file from s3, it gives error every time.

我可以轻松地从 s3 获取存储桶名称，但是当我从 s3 读取 csv 文件时，每次都会出错。

import boto3
import pandas as pd

s3 = boto3.client('s3',
         aws_access_key_id='yyyyyyyy',
         aws_secret_access_key='xxxxxxxxxxx')
# Call S3 to list current buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
    print bucket['Name']

output
s3-bucket-data

.

import pandas as pd
import StringIO
from boto.s3.connection import S3Connection

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('s3-bucket-data')

fileName = "data.csv"

content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))

getting error-

得到错误-

boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request

How I can read the csv from s3?

我如何从 s3 读取 csv？

Answer 1

回答by muon

you can use s3fspackage

你可以使用s3fs包

s3fsalso supports aws profiles in credential files.

s3fs还支持凭证文件中的 aws 配置文件。

Here is an example (you don't have to chunk it, but i just had this example handy),

这是一个例子（你不必把它分块，但我只是把这个例子放在手边），

import os
import pandas as pd
import s3fs
import gzip

chunksize = 999999
usecols = ["Col1", "Col2"]

filename = 'some_csv_file.csv.gz'
s3_bucket_name = 'some_bucket_name'

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
s3f = s3fs.S3FileSystem(
    anon=False,
    key=AWS_KEY,
    secret=AWS_SECRET)

# or if you have a profile defined in credentials file:
#aws_shared_credentials_file = 'path/to/aws/credentials/file/'
#os.environ['AWS_SHARED_CREDENTIALS_FILE'] = aws_shared_credentials_file
#s3f = s3fs.S3FileSystem(
#    anon=False,
#    profile_name=s3_profile)

filepath = os.path.join(s3_bucket_name, filename)
with s3f.open(filepath, 'rb') as f:
    gz = gzip.GzipFile(fileobj=f)  # Decompress data with gzip

    chunks = pd.read_csv(gz,
                            usecols=usecols,
                            chunksize=chunksize,
                            iterator=True,
                            )

    df = pd.concat([c for c in chunks], axis=1)

Answer 2

回答by rrmerugu

botois onething I love when it comes to handling data on S3 with python..

boto在使用 python 处理 S3 上的数据时，这是我喜欢的一件事。

install botousing pip install boto

安装boto使用pip install boto

import boto
from boto.s3.key import Key

keyId ="your_aws_key_id"
sKeyId="your_aws_secret_key_id"
srcFileName="abc.txt" # filename on S3
destFileName="s3_abc.txt" # output file name
bucketName="mybucket001" # S3 bucket name 

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)

#Get the Key object of the given key, in the bucket
k = Key(bucket,srcFileName)

#Get the contents of the key into a file 
k.get_contents_to_filename(destFileName)

Answer 3

回答by Manas Gaur

I experienced this issue with a few AWS Regions. I created a bucket in "us-east-1" and the following code worked fine:

我在几个 AWS 区域遇到了这个问题。我在“us-east-1”中创建了一个存储桶，以下代码运行良好：

import boto
from boto.s3.key import Key
import StringIO
import pandas as pd
keyId ="xxxxxxxxxxxxxxxxxx"
sKeyId="yyyyyyyyyyyyyyyyyy"
srcFileName="zzzzz.csv"
bucketName="elasticbeanstalk-us-east-1-aaaaaaaaaaaa"

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
k = Key(bucket,srcFileName)
content = k.get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))

Try creating a new bucket in us-east-1 and see if it works.

尝试在 us-east-1 中创建一个新存储桶，看看它是否有效。

Answer 4

回答by sepideh

Try the following:

请尝试以下操作：

import boto3
from boto3 import session
import pandas as pd
import io

session = boto3.session.Session(region_name='XXXX')
s3client = session.client('s3', config = 
boto3.session.Config(signature_version='XXXX'))
response = s3client.get_object(Bucket='myBucket', Key='myKey')

dataset = pd.read_csv(io.BytesIO(response['Body'].read()), encoding='utf8')

pandas 使用 python2.7 从 Amazon s3 读取 csv

提问by lucy

回答by muon

回答by rrmerugu

回答by Manas Gaur

回答by sepideh

相关推荐

最近更新

标签

pandas 使用 python2.7 从 Amazon s3 读取 csv

提问by lucy

回答by muon

回答by rrmerugu

回答by Manas Gaur

回答by sepideh

相关推荐

pandas 如何使用python计算一列数据相对于另一列的百分位排名

pandas Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 44 中的字节 0x81：字符映射到 <undefined>

pandas 如何从熊猫中的两列创建一个数组

pandas 根据其他列的条件在pandas中创建一个新列

相关推荐

最近更新

标签