postgresql 如何将数据从 AWS Postgres RDS 传输到 S3(然后是 Redshift)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26781758/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to pipe data from AWS Postgres RDS to S3 (then Redshift)?
提问by jenswirf
I'm using AWS data pipeline service to pipe data from a RDS MySql
database to s3
and then on to Redshift
, which works nicely.
我正在使用 AWS 数据管道服务将数据从RDS MySql
数据库传输到s3
,然后再传输到Redshift
,效果很好。
However, I also have data living in an RDS Postres
instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?
但是,我也有数据存在于一个RDS Postres
实例中,我想以相同的方式进行管道传输,但是我很难设置 jdbc 连接。如果这不受支持,是否有解决方法?
"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”
采纳答案by xgess
this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.
这还不行。aws 还没有构建/发布可以很好地连接到 postgres 的功能。不过,您可以在 shellcommandactivity 中执行此操作。您可以编写一些 ruby 或 python 代码来执行此操作,然后使用 scriptUri 将其放入 s3 上的脚本中。您也可以只编写一个 psql 命令将表转储到 csv,然后在该活动节点中使用“staging: true”将其通过管道传输到 OUTPUT1_STAGING_DIR。
something like this:
像这样:
{
"id": "DumpCommand",
"type": "ShellCommandActivity",
"runsOn": { "ref": "MyEC2Resource" },
"stage": "true",
"output": { "ref": "S3ForRedshiftDataNode" },
"command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}
i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.
我没有运行它来验证,因为启动管道很痛苦:(所以仔细检查命令中的转义。
- pros: super straightforward and requires no additional script files to upload to s3
- cons: not exactly secure. your db password will be transmitted over the wire without encryption.
- 优点:超级简单,不需要额外的脚本文件上传到 s3
- 缺点:不完全安全。您的数据库密码将在没有加密的情况下通过网络传输。
look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.
查看 aws 刚刚在参数化模板数据管道上推出的新内容:http: //docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html。看起来它将允许对任意参数进行加密。
回答by PeterssonJesper
Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:
如今,您可以定义复制活动以将数据从 Postgres RDS 实例提取到 S3。在数据管道界面中:
- Create a data node of the type SqlDataNode. Specify table name and select query
- Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
- Create a data node of the type S3DataNode
- Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output
- 创建一个 SqlDataNode 类型的数据节点。指定表名并选择查询
- 通过指定 RDS 实例 ID(实例 ID 在您的 URL 中,例如 your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com)以及用户名、密码和数据库名称来设置数据库连接。
- 创建 S3DataNode 类型的数据节点
- 创建一个 Copy 活动并将 SqlDataNode 设置为输入,将 S3DataNode 设置为输出
回答by Manuel G
AWS now allow partners to do near real time RDS -> Redshift inserts.
AWS 现在允许合作伙伴进行近乎实时的 RDS -> Redshift 插入。
https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/
https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/