postgresql 如何将数据从 AWS Postgres RDS 传输到 S3(然后是 Redshift)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26781758/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:40:40  来源:igfitidea点击:

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

postgresqlamazon-web-servicesamazon-redshiftamazon-data-pipeline

提问by jenswirf

I'm using AWS data pipeline service to pipe data from a RDS MySqldatabase to s3and then on to Redshift, which works nicely.

我正在使用 AWS 数据管道服务将数据从RDS MySql数据库传输到s3,然后再传输到Redshift,效果很好。

However, I also have data living in an RDS Postresinstance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?

但是,我也有数据存在于一个RDS Postres实例中,我想以相同的方式进行管道传输,但是我很难设置 jdbc 连接。如果这不受支持,是否有解决方法?

"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”

采纳答案by xgess

this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.

这还不行。aws 还没有构建/发布可以很好地连接到 postgres 的功能。不过,您可以在 shellcommandactivity 中执行此操作。您可以编写一些 ruby​​ 或 python 代码来执行此操作,然后使用 scriptUri 将其放入 s3 上的脚本中。您也可以只编写一个 psql 命令将表转储到 csv,然后在该活动节点中使用“staging: true”将其通过管道传输到 OUTPUT1_STAGING_DIR。

something like this:

像这样:

{
  "id": "DumpCommand",
  "type": "ShellCommandActivity",
  "runsOn": { "ref": "MyEC2Resource" },
  "stage": "true",
  "output": { "ref": "S3ForRedshiftDataNode" },
  "command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}

i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.

我没有运行它来验证,因为启动管道很痛苦:(所以仔细检查命令中的转义。

  • pros: super straightforward and requires no additional script files to upload to s3
  • cons: not exactly secure. your db password will be transmitted over the wire without encryption.
  • 优点:超级简单,不需要额外的脚本文件上传到 s3
  • 缺点:不完全安全。您的数据库密码将在没有加密的情况下通过网络传输。

look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.

查看 aws 刚刚在参数化模板数据管道上推出的新内容:http: //docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html。看起来它将允许对任意参数进行加密。

回答by PeterssonJesper

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:

如今,您可以定义复制活动以将数据从 Postgres RDS 实例提取到 S3。在数据管道界面中:

  1. Create a data node of the type SqlDataNode. Specify table name and select query
  2. Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
  3. Create a data node of the type S3DataNode
  4. Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output
  1. 创建一个 SqlDataNode 类型的数据节点。指定表名并选择查询
  2. 通过指定 RDS 实例 ID(实例 ID 在您的 URL 中,例如 your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com)以及用户名、密码和数据库名称来设置数据库连接。
  3. 创建 S3DataNode 类型的数据节点
  4. 创建一个 Copy 活动并将 SqlDataNode 设置为输入,将 S3DataNode 设置为输出

回答by Manuel G

AWS now allow partners to do near real time RDS -> Redshift inserts.

AWS 现在允许合作伙伴进行近乎实时的 RDS -> Redshift 插入。

https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/

https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/