postgresql 批量更新 Redshift 中的现有行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22543093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:21:54  来源:igfitidea点击:

Bulk updating existing rows in Redshift

sqlpostgresqlamazon-redshift

提问by moinudin

This seems like it should be easy, but isn't. I'm migrating a query from MySQL to Redshift of the form:

这看起来应该很容易,但事实并非如此。我正在将查询从 MySQL 迁移到以下形式的 Redshift:

INSERT INTO table
(...)
VALUES
(...)
ON DUPLICATE KEY UPDATE
  value = MIN(value, VALUES(value))

For primary keys we're inserting that aren't already in the table, those are just inserted. For primary keys that are already in the table, we update the row's values based on a condition that depends on the existing and new values in the row.

对于我们要插入的表中尚未存在的主键,它们只是被插入。对于表中已经存在的主键,我们根据依赖于行中现有值和新值的条件更新行的值。

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.htmldoes not work, because filter_expressionin my case depends on the current entries in the table. I'm currently creating a staging table, inserting into it with a COPYstatement and am trying to figure out the best way to merge the staging and real tables.

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html不起作用,因为filter_expression在我的情况下取决于表中的当前条目。我目前正在创建一个临时表,用一条COPY语句插入它,并试图找出合并临时表和真实表的最佳方法。

回答by mike_pdb

I'm having to do exactly this for a project right now. The method I'm using involves 3 steps:

我现在必须为一个项目做这件事。我使用的方法包括 3 个步骤:

1.

1.

Run an update that addresses changed fields (I'm updating whether or not the fields have changed, but you can certainly qualify that):

运行更新以解决更改的字段(我正在更新字段是否已更改,但您当然可以限定):

update table1 set col1=s.col1, col2=s.col2,...
from table1 t
 join stagetable s on s.primkey=t.primkey;

2.

2.

Run an insert that addresses new records:

运行处理新记录的插入:

insert into table1
select s.* 
from stagetable s 
 left outer join table1 t on s.primkey=t.primkey
where t.primkey is null;

3.

3.

Mark rows no longer in the source as inactive (our reporting tool uses views that filter inactive records):

将源中不再存在的行标记为非活动(我们的报告工具使用过滤非活动记录的视图):

update table1 
set is_active_flag='N', last_updated=sysdate
from table1 t
 left outer join stagetable s on s.primkey=t.primkey
where s.primkey is null;

回答by oaamados

Is posible to create a temp table. In redshift is better to delete and insert the record. Check this doc

可以创建临时表。在红移中最好删除和插入记录。检查此文档

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html

回答by Red Boy

Here is the fully working approach for Redshift.

这是 Redshift 的完整工作方法。

Assumptions:

假设:

A.Data available in S3in gunzipformat with '|'separated columns, may have some garbage data see maxerror.

A.Data可用S3gunzip解与格式“|” 分隔列,可能有一些垃圾数据见maxerror

B.Sales fact with two dimension tables to keep it simple (TIME and SKU(SKU may have many groups and categories))).

B. 销售事实用两个维度表来保持简单(时间和 SKU(SKU 可能有很多组和类别)))。

C.You have Sales table like this.

C. 你有这样的 Sales 表。

CREATE TABLE sales (
 sku_id int encode zstd,
 date_id int encode zstd,
quantity numeric(10,2) encode delta32k,
);

1)Create Staging table, that should resemble with your Online Table used by app/apps.

1) 创建临时表,它应该类似于应用程序/应用程序使用的在线表。

CREATE TABLE stg_sales_onetime (
 sku_number varchar(255) encode zstd,
 time varchar(255) encode zstd,
 qty_str varchar(20) encode zstd,
 quantity numeric(10,2) encode delta32k,
 sku_id int encode zstd,
 date_id int encode zstd
);

2)Copy data from S3( this could done using SSH).

2) 从 S3 复制数据(这可以使用 SSH 完成)。

copy stg_sales_onetime (sku_number,time,qty_str) from 
  's3://<buecket_name>/<full_file_path>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' delimiter '|' ignoreheader 1 maxerror as 1000 gzip;

3)This step is optional, in case you don't have good formatted data, this a your transformation step if needed(as converting String(12.555654) quantity to Number(12.56))

3)这一步是可选的,如果你没有好的格式化数据,如果需要,这是你的转换步骤(如将 String(12.555654) 数量转换为 Number(12.56))

update  stg_sales_onetime set quantity=convert(decimal(10,2),qty_str);

4)Populating the correct IDs from dimension table.

4) 从维度表中填充正确的 ID。

update  stg_sales_onetime set sku_id=<your_sku_demesion_table>.sku_id  from <your_sku_demesion_table> where stg_sales_onetime.sku_number=<your_sku_demesion_table>.sku_number;
update  stg_sales_onetime set time_id=<your_time_demesion_table>.time_id  from <your_time_demesion_table> where stg_sales_onetime.time=<your_time_demesion_table>.time;

5)Finally you have data good to go from Staging to Online Sales table.

5) 最后,您有了可以从 Staging 到 Online Sales 表的数据。

insert into sales(sku_id,time_id,quantity)  select sku_id,time_id,quantity from stg_sales_onetime;