SQL 如何使用 Postgres 中的 CSV 文件中的值更新选定的行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8910494/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:03:38  来源:igfitidea点击:

How to update selected rows with values from a CSV file in Postgres?

sqldatabasepostgresqlfile-iocsv

提问by user519753

I'm using Postgres and would like to make a big update query that would pick up from a CSV file, lets say I got a table that's got (id, banana, apple).

我正在使用 Postgres 并希望进行一个大型更新查询,该查询将从 CSV 文件中获取,假设我有一个表,其中包含(id, banana, apple).

I'd like to run an update that changes the Bananas and not the Apples, each new Banana and their ID would be in a CSV file.

我想运行一个更新来更改香蕉而不是苹果,每个新香蕉及其 ID 都将在一个 CSV 文件中。

I tried looking at the Postgres site but the examples are killing me.

我尝试查看 Postgres 站点,但这些示例让我很沮丧。

回答by Erwin Brandstetter

COPYthe file to a temporary staging table and update the actual table from there. Like:

COPY文件到临时登台表并从那里更新实际表。喜欢:

CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below

COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);

UPDATE tbl
SET    banana = tmp_x.banana
FROM   tmp_x
WHERE  tbl.id = tmp_x.id;

DROP TABLE tmp_x; -- else it is dropped at end of session automatically

If the imported table matches the table to be updated exactly, this may be convenient:

如果导入的表与要更新的表完全匹配,这可能很方便:

CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;

Creates an empty temporary table matching the structure of the existing table, without constraints.

创建一个与现有表结构匹配的空临时表,没有约束。

Privileges

特权

SQL COPYrequires superuser privileges for this. (The manual):

SQLCOPY为此需要超级用户权限。(手册):

COPYnaming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.

COPY命名文件或命令只允许数据库超级用户,因为它允许读取或写入服务器有权访问的任何文件。

The psqlmeta-command \copyworks for any db role. The manual:

psql的元命令\copy适用于任何数据库的作用。手册:

Performs a frontend (client) copy. This is an operation that runs an SQL COPYcommand, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

执行前端(客户端)复制。这是一个运行 SQLCOPY命令的操作,但不是服务器读取或写入指定文件,而是 psql 读取或写入文件并在服务器和本地文件系统之间路由数据。这意味着文件可访问性和权限是本地用户的,而不是服务器的,并且不需要 SQL 超级用户权限。

The scope of temporary tables is limited to a single sessionof a single role, so the above has to be executed in the same psql session:

临时表的范围仅限于单个角色的单个会话,因此必须在同一个 psql 会话中执行上述操作:

CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;

If you are scripting this in a bash command, be sure to wrap it all in a singlepsql call. Like:

如果您在 bash 命令中编写此脚本,请确保将其全部包装在单个psql 调用中。喜欢:

echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql

Normally, you need the meta-command \\to switch between psql meta commands and SQL comands in psql, but \copyis an exception to this rule. The manual again:

通常,您需要元命令\\在 psql 中的 psql 元命令和 SQL 命令之间切换,但\copy此规则的一个例外。又是说明书:

special parsing rules apply to the \copymeta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of \copy, and neither variable interpolation nor backquote expansion are performed in the arguments.

特殊的解析规则适用于\copy元命令。与大多数其他元命令不同,该行的整个剩余部分始终被视为 的参数\copy,并且在参数中既不执行变量插值也不执行反引号扩展。

Big tables

大桌子

If the import-table is big it may pay to increase temp_bufferstemporarily for the session (first thing in the session):

如果导入表很大,可能需要temp_buffers为会话临时增加(会话中的第一件事):

SET temp_buffers = '500MB';  -- example value

Add an index to the temporary table:

向临时表添加索引:

CREATE INDEX tmp_x_id_idx ON tmp_x(id);

And run ANALYZEmanually, since temporary tables are not covered by autovacuum / auto-analyze.

ANALYZE手动运行,因为 autovacuum / auto-analyze 不涵盖临时表。

ANALYZE tmp_x;

Related answers:

相关回答:

回答by Anupama V Iyengar

You can try the below code written in python, the input file is the csv file whose contents you want to update into the table. Each row is split based on comma so for each row, row[0]is the value under first column, row[1] is value under second column etc.

您可以尝试以下用 python 编写的代码,输入文件是您要更新到表中的内容的 csv 文件。每行基于逗号分割,因此对于每一行,row[0] 是第一列下的值,row[1] 是第二列下的值等。

    import csv
    import xlrd
    import os
    import psycopg2
    import django
    from yourapp import settings
    django.setup()
    from yourapp import models


    try:
       conn = psycopg2.connect("host=localhost dbname=prodmealsdb 
       user=postgres password=blank")
       cur = conn.cursor()

       filepath = '/path/to/your/data_to_be_updated.csv'
       ext = os.path.splitext(filepath)[-1].lower()
       if (ext == '.csv'): 
          with open(filepath) as csvfile:
          next(csvfile)
          readCSV = csv.reader(csvfile, delimiter=',')
          for row in readCSV:
              print(row[3],row[5])
              cur.execute("UPDATE your_table SET column_to_be_updated = %s where 
              id = %s", (row[5], row[3]))
              conn.commit()
          conn.close()
          cur.close()

    except (Exception, psycopg2.DatabaseError) as error:
    print(error)
    finally:
    if conn is not None:
      conn.close()