MySQL 如何将 csv 数据文件复制到 Amazon RedShift?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15261743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:47:32  来源:igfitidea点击:

How to copy csv data file to Amazon RedShift?

mysqlcsvamazon-redshift

提问by ciphor

I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems.

我正在尝试将一些 MySQL 表迁移到 Amazon Redshift,但遇到了一些问题。

The steps are simple: 1. Dump the MySQL table to a csv file 2. Upload the csv file to S3 3. Copy the data file to RedShift

步骤很简单: 1. 将 MySQL 表转储到 csv 文件 2. 将 csv 文件上传到 S3 3. 将数据文件复制到 RedShift

Error occurs in step 3:

步骤 3 中出现错误:

The SQL command is:

SQL命令是:

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv;

从 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' 分隔符 ',' csv 复制 TABLE_A;

The error info:

错误信息:

An error occurred when executing the SQL command: copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx ERROR: COPY CSV is not supported [SQL State=0A000] Execution time: 0.53s 1 statement(s) failed.

执行SQL命令时出错:copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx ERROR: COPY CSV is not supported [SQL State=0A000] 执行时间:0.53s 1 条语句失败。

I don't know if there's any limitations on the format of the csv file, say the delimiters and quotes, I cannot find it in documents.

我不知道 csv 文件的格式是否有任何限制,比如分隔符和引号,我在文档中找不到它。

Any one can help?

任何人都可以帮忙吗?

采纳答案by ciphor

The problem is finally resolved by using:

问题最终通过使用解决:

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' removequotes;

从 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' 分隔符 ',' removequotes 复制 TABLE_A;

More information can be found here http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

更多信息可以在这里找到http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

回答by Masashi Miyazaki

Now Amazon Redshift supports CSV option for COPY command. It's better to use this option to import CSV formatted data correctly. The format is shown bellow.

现在 Amazon Redshift 支持 COPY 命令的 CSV 选项。最好使用此选项正确导入 CSV 格式的数据。格式如下所示。

COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV;

The default delimiter is ( , ) and the default quotes is ( " ). Also you can import TSV formatted data with CSV and DELIMITER option like this.

默认分隔符是 ( , ),默认引号是 ( " )。您也可以像这样使用 CSV 和 DELIMITER 选项导入 TSV 格式的数据。

COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV DELIMITER '\t';

There are some disadvantages to use the old way(DELIMITER and REMOVEQUOTES) that REMOVEQUOTES does not support to have a new line or a delimiter character within an enclosed filed. If the data can include this kind of characters, you should use CSV option.

使用旧方式(DELIMITER 和 REMOVEQUOTES)有一些缺点,REMOVEQUOTES 不支持在封闭的字段中包含新行或分隔符。如果数据可以包含此类字符,则应使用 CSV 选项。

See the following link for the details.

有关详细信息,请参阅以下链接。

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

回答by Dipesh Palod

You can try with this

你可以试试这个

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' csv;

从 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' csv 复制 TABLE_A;

CSV itself means comma separated values, no need to provide delimiter with this. Please refer link.

CSV 本身表示逗号分隔值,无需为此提供分隔符。请参考链接。

[http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-format]

[ http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-format]

回答by asafm

If you want to save your self some code/ you have a very basic use case you can use Amazon Data Pipeline. it stats a spot instance and perform the transformation within amazon network and it's really intuitive tool (but very simple so you can't do complex things with it)

如果您想为自己保存一些代码/您有一个非常基本的用例,您可以使用 Amazon Data Pipeline。它统计一个现场实例并在亚马逊网络内执行转换,它是一个非常直观的工具(但非常简单,所以你不能用它做复杂的事情)

回答by olekb

It looks like you are trying to load local file into REDSHIFT table. CSV file has to be on S3 for COPY command to work.

看起来您正在尝试将本地文件加载到 REDSHIFT 表中。CSV 文件必须在 S3 上才能使 COPY 命令起作用。

If you can extract data from table to CSV file you have one more scripting option. You can use Python/boto/psycopg2 combo to script your CSV load to Amazon Redshift.

如果您可以将表中的数据提取到 CSV 文件,则您还有一个脚本选项。您可以使用 Python/boto/psycopg2 组合将您的 CSV 加载脚本编写到 Amazon Redshift。

In my MySQL_To_Redshift_LoaderI do the following:

在我的MySQL_To_Redshift_Loader 中,我执行以下操作:

  1. Extract data from MySQL into temp file.

    loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server]    
    ...
    q="""
    %s %s
    INTO OUTFILE '%s'
    FIELDS TERMINATED BY '%s'
    ENCLOSED BY '%s'
    LINES TERMINATED BY '\r\n';
    """ % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote)
    p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
    p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
    ...
    
  2. Compress and load data to S3 using boto Python module and multipart upload.

    conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(bucket_name)
    k = Key(bucket)
    k.key = s3_key_name
    k.set_contents_from_file(file_handle, cb=progress, num_cb=20, 
    reduced_redundancy=use_rr )
    
  3. Use psycopg2 COPY command to append data to Redshift table.

    sql="""
    copy %s from '%s' 
    CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' 
    DELIMITER '%s' 
    FORMAT CSV %s 
    %s 
    %s 
    %s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)
    
  1. 将 MySQL 中的数据提取到临时文件中。

    loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server]    
    ...
    q="""
    %s %s
    INTO OUTFILE '%s'
    FIELDS TERMINATED BY '%s'
    ENCLOSED BY '%s'
    LINES TERMINATED BY '\r\n';
    """ % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote)
    p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
    p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
    ...
    
  2. 使用 boto Python 模块和分段上传将数据压缩并加载到 S3。

    conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(bucket_name)
    k = Key(bucket)
    k.key = s3_key_name
    k.set_contents_from_file(file_handle, cb=progress, num_cb=20, 
    reduced_redundancy=use_rr )
    
  3. 使用 psycopg2 COPY 命令将数据附加到 Redshift 表。

    sql="""
    copy %s from '%s' 
    CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' 
    DELIMITER '%s' 
    FORMAT CSV %s 
    %s 
    %s 
    %s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)
    

回答by Yusuf Hassan

Since the resolution has already been provided, I'll not repeat the obvious.

由于已经提供了解决方案,我不会重复显而易见的事情。

However, in case you receive some more error which you're not able to figure out, simply execute on your workbench while you're connected to any of the Redshift accounts:

但是,如果您收到更多无法弄清楚的错误,只需在连接到任何 Redshift 帐户时在工作台上执行:

select * from stl_load_errors [where ...];

stl_load_errors contains all the Amazon RS load errors in historical fashion where a normal user can view details corresponding to his / her own account but a superuser can have all the access.

stl_load_errors 以历史方式包含所有 Amazon RS 加载错误,其中普通用户可以查看与他/她自己的帐户对应的详细信息,但超级用户可以拥有所有访问权限。

The details are captured elaborately at : Amazon STL Load Errors Documentation

详细信息在以下位置详细捕获: Amazon STL 加载错误文档

回答by Devavrata

Little late to comment but it can be useful:-

评论有点晚,但它可能很有用:-

You can use an open source project to copy tables directly from mysql to redshift - sqlshift.

您可以使用开源项目将表直接从 mysql 复制到redshift- sqlshift

It only requires spark and if you have yarn then it can also be used.

它只需要火花,如果你有纱线,那么它也可以使用。

Benefits:- It will automatically decides distkeyand interleaved sortkeyusing primary key.

优点:- 它将使用主键自动决定distkey交错排序键。