SQL 遍历表,对每一行进行计算

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35856984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:28:37  来源:igfitidea点击:

Iterate through table, perform calculation on each row

sqlpostgresqlpostgis

提问by Canadian_Marine

I would like to preface this by saying I am VERY new to SQL, but my work now requires that I work in it.

我想先说我对 SQL 非常陌生,但我现在的工作要求我在其中工作。

I have a dataset containing topographical point data (x,y,z). I am trying to build a KNN model based on this data. For every point 'P', I search for the 100 points in the data set nearest P (nearest meaning geographically nearest). I then average the values of these points (this average is known as a residual), and add this value to the table in the 'resid' column.

我有一个包含地形点数据 (x,y,z) 的数据集。我正在尝试基于此数据构建 KNN 模型。对于每个点“P”,我搜索数据集中最接近 P 的 100 个点(最近的意思是地理上最近的)。然后我平均这些点的值(这个平均值被称为残差),并将这个值添加到“resid”列中的表中。

As a proof of concept, I am trying to simply iterate over the table, and set the value of the 'resid' column to 1.0 in every row.

作为概念证明,我试图简单地遍历表,并将每一行中“resid”列的值设置为 1.0。

My query is this:

我的查询是这样的:

CREATE OR REPLACE FUNCTION LoopThroughTable() RETURNS VOID AS '
DECLARE row table%rowtype;
BEGIN
    FOR row in SELECT * FROM table LOOP
        SET row.resid = 1.0;
    END LOOP;
END

' LANGUAGE 'plpgsql';

SELECT LoopThroughTable() as output; 

This code executes and returns successfully, but when I check the table, no alterations have been made. What is my error?

此代码执行并成功返回,但是当我检查表时,没有进行任何更改。我的错误是什么?

回答by a_horse_with_no_name

Doing updates row-by-row in a loop is almost always a bad idea and willbe extremely slow and won't scale. You should really find a way to avoid that.

在循环中逐行进行更新几乎总是一个坏主意,并且非常慢且无法扩展。你真的应该找到一种方法来避免这种情况。

After having said that:

说完之后:

All your function is doing is to change the value of the column value in memory - you are just modifying the contents of a variable. If you want to update the data you need an updatestatement:

您的函数所做的只是更改内存中列值的值——您只是在修改变量的内容。如果要更新数据,则需要update声明:

You need to use an UPDATEinside the loop:

您需要UPDATE在循环内使用一个:

CREATE OR REPLACE FUNCTION LoopThroughTable() 
  RETURNS VOID 
AS
$$
DECLARE 
   t_row the_table%rowtype;
BEGIN
    FOR t_row in SELECT * FROM the_table LOOP
        update the_table
            set resid = 1.0
        where pk_column = t_row.pk_column; --<<< !!! important !!!
    END LOOP;
END;
$$ 
LANGUAGE plpgsql;

Note that you haveto add a wherecondition on the primary key to the updatestatement otherwise you would update allrows for eachiteration of the loop.

请注意,您必须where在主键上向update语句添加条件,否则您将更新循环的每次迭代的所有行。

A slightlymore efficient solution is to use a cursor, and then do the update using where current of

稍微更有效的解决方案是使用一个光标,然后用做更新where current of

CREATE OR REPLACE FUNCTION LoopThroughTable() 
  RETURNS VOID 
AS $$
DECLARE 
   t_curs cursor for 
      select * from the_table;
   t_row the_table%rowtype;
BEGIN
    FOR t_row in t_curs LOOP
        update the_table
            set resid = 1.0
        where current of t_curs;
    END LOOP;
END;
$$ 
LANGUAGE plpgsql;


So if I execute the UPDATE query after the loop has finished, will that commit the changes to the table?

因此,如果我在循环完成后执行 UPDATE 查询,是否会将更改提交到表?

No. The call to the function runs in the context of the calling transaction. So you need to commitafter running SELECT LoopThroughTable()if you have disabled auto commit in your SQL client.

否。对函数的调用在调用事务的上下文中运行。因此,如果您在 SQL 客户端中禁用了自动提交,则需要commit在运行后执行SELECT LoopThroughTable()



Note that the language name is an identifier, do not use single quotes around it. You should also avoid using keywords like rowas variable names.

请注意,语言名称是一个标识符,请勿在其周围使用单引号。您还应该避免使用关键字row作为变量名。

Using dollar quoting(as I did) also makes writing the function body easier

使用美元引用(就像我一样)也使编写函数体更容易

回答by Mike T

I'm not sure if the proof of concept example does what you want. In general, with SQL, you almost neverneed a FOR loop. While you can use a function, if you have PostgreSQL 9.3 or later, you can use a LATERALsubqueryto perform subqueries for each row.

我不确定概念证明示例是否符合您的要求。在一般情况下,使用SQL,你几乎永远需要一个for循环。虽然您可以使用函数,但如果您有 PostgreSQL 9.3 或更高版本,则可以使用LATERAL查询为每一行执行子查询。

For example, create 10,000 random 3D points with a random valuecolumn:

例如,使用随机value列创建 10,000 个随机 3D 点:

CREATE TABLE points(
  gid serial primary key,
  geom geometry(PointZ),
  value numeric
);
CREATE INDEX points_geom_gist ON points USING gist (geom);
INSERT INTO points(geom, value)
SELECT ST_SetSRID(ST_MakePoint(random()*1000, random()*1000, random()*100), 0), random()
FROM generate_series(1, 10000);

For each point, search for the 100 nearest points (except the point in question), and find the residual between the points' valueand the average of the 100 nearest:

对于每个点,搜索最近的 100 个点(所讨论的点除外),并找到这些点value与 100 个最近点的平均值之间的残差:

SELECT p.gid, p.value - avg(l.value) residual
FROM points p,
  LATERAL (
    SELECT value
    FROM points j
    WHERE j.gid <> p.gid
    ORDER BY p.geom <-> j.geom
    LIMIT 100
) l
GROUP BY p.gid
ORDER BY p.gid;

回答by Amit Bhagat

Following is a simple example to update rows in a table:

以下是更新表中行的简单示例:

Assuming the row id field id

假设行 id 字段 id

Update all rows:

更新所有行:

UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff)

Selective row update

选择性行更新

UPDATE my_table SET field1='some value'
WHERE id IN (SELECT id FROM staff WHERE field2='same value')

回答by Elad

You don't need a function for that. All you need is to run this query:

你不需要一个函数。您只需要运行此查询:

UPDATE table SET resid = 1.0;

if you want to do it with a function you can use SQLfunction:

如果你想用一个函数来做,你可以使用SQL函数:

CREATE OR REPLACE FUNCTION LoopThroughTable()
  RETURNS VOID AS
$BODY$
UPDATE table SET resid = 1.0;
$BODY$
  LANGUAGE sql VOLATILE

if you want to use plpgsqlthen function would be:

如果你想使用plpgsql那么功能将是:

CREATE OR REPLACE FUNCTION LoopThroughTable()
  RETURNS void AS
$BODY$
begin
       UPDATE table SET resid = 1.0;
end;
$BODY$
  LANGUAGE plpgsql VOLATILE

Note that it is not recommended to use plpgsqlfunctions for tasks that can be done with Sqlfunctions.

请注意,plpgsql对于可以使用函数完成的任务,不建议使用Sql函数。