postgresql 将 csv 文件的几列复制到表格中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12618232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 23:45:21  来源:igfitidea点击:

Copy a few of the columns of a csv file into a table

postgresqlcsvcopyetl

提问by POTENZA

I have a CSV file with 10 columns. After creating a PostgreSQL table with 4 columns, I want to copy some of 10 columns into the table.

我有一个包含 10 列的 CSV 文件。在创建一个包含 4 列的 PostgreSQL 表后,我想将 10 列中的一些复制到表中。

the columns of my CSV table are like:

我的 CSV 表的列是这样的:

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

the columns of my PostgreSQL table should be like:

我的 PostgreSQL 表的列应该是这样的:

x2 x5 x7 x10

回答by Clodoaldo Neto

If it is an ad hoc task

如果是临时任务

Create a temporary table with all the columns in the input file

创建一个包含输入文件中所有列的临时表

create temporary table t (x1 integer, ... , x10 text)

Copy from the file into it:

从文件复制到其中:

copy t (x1, ... , x10)
from '/path/to/my_file'
with (format csv)

Now insert into the definitive table from the temp:

现在从临时表插入最终表:

insert into my_table (x2, x5, x7, x10)
select x2, x5, x7, x10
from t

And drop it:

然后放下它:

drop table t

If it is a frequent task

如果是频繁的任务

Use the file_fdwextension. As superuser:

使用file_fdw扩展名。作为超级用户:

create extension file_fdw;

create server my_csv foreign data wrapper file_fdw;

create foreign table my_csv (
    x1 integer,
    x2 text,
    x3 text
) server my_csv
options (filename '/tmp/my_csv.csv', format 'csv' )
;

Grant select permission on the table to the user who will read it:

将表的选择权限授予将读取它的用户:

grant select on table my_csv to the_read_user;

Then whenever necessary read directly from the csv file as if it were a table:

然后在必要时直接从 csv 文件中读取,就好像它是一个表一样:

insert into my_table (x2)
select x2
from my_csv
where x1 = 2

回答by Julien

You can provide the columns your want to fill with the COPYcommand. Like so:

您可以提供要使用该COPY命令填充的列。像这样:

\copy your_table (x2,x5,x7,x10) FROM '/path/to/your-file.csv' DELIMITER ',' CSV;

Here's the docfor the COPYcommand.

这里的文档COPY命令。

回答by James Brown

Just arrived here on a pursuit for a solution to only load a subset of columns but apparently it's not possible. So, use awk (or cut) to extract the wanted columns to a new file new_file:

刚到这里是为了寻求仅加载列子集的解决方案,但显然这是不可能的。因此,使用 awk (或cut)将所需的列提取到新文件中new_file

$ awk '{print , , , }' file > new_file

and load the new_file. You could pipe the output straight to psql:

并加载new_file. 您可以将输出直接通过管道传输到psql

$ cut -d \  -f 2,5,7,10 file | 
  psql -h host -U user -c "COPY table(col1,col2,col3,col4) FROM STDIN DELIMITER ' '" database

Notice COPY, not \COPY.

注意COPY,不是\COPY

Update:

更新:

As it was pointed out in the comments, neither of the above examples can handle quoted delimiters in the data. The same goes for newlines, too, as awk or cutare not CSV aware. Quoted delimiters can be handled with GNU awk, though.

正如评论中指出的那样,上述示例都不能处理数据中的引用分隔符。换行也是如此,因为 awk 或cut不支持 CSV。不过,可以使用 GNU awk 处理带引号的分隔符。

This is a three-column file:

这是一个三列文件:

$ cat file
1,"2,3",4

Using GNU awk's FPATvariable we can change the order of the fields (or get a subset of them) even when the quoted fields have field separators in them:

使用 GNU awk 的FPAT变量,即使引用的字段中有字段分隔符,我们也可以更改字段的顺序(或获取它们的子集):

$ gawk 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")";OFS=","}{print ,,}' file
"2,3",1,4

Explained:

解释:

$ gawk '
BEGIN {                          # instead of field separator FS
    FPAT="([^,]*)|(\"[^\"]+\")"  # ...  we define field pattern FPAT
    OFS=","                      # output field separator OFS
} 
{
    print ,,               # change field order
    # print                    # or get a subset of fields
}' file 

Notice that FPATis GNU awk only. For other awks it's just a regular variable.

请注意,这FPAT只是 GNU awk。对于其他 awk,它只是一个常规变量。

回答by arredond

As other answers have pointed out, it's been possible to specify columns to copy into the PG table. However, without the option to reference column names in the CSV, this had little utility apart from loading into a table where columns had a different order.

正如其他答案所指出的那样,可以指定要复制到 PG 表中的列。但是,如果没有在 CSV 中引用列名的选项,除了加载到列具有不同顺序的表中之外,这几乎没有用处。

Fortunately, as of Postgres 9.3, it's possible to copy columns not only from a file or from standard input, but also from a shell command using PROGRAM:

幸运的是,从 Postgres 9.3 开始,不仅可以从文件或标准输入复制列,还可以使用 PROGRAM 从 shell 命令复制列:

PROGRAM

A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.

Note that the command is invoked by the shell, so if you need to pass any arguments to shell command that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input in it.

程序

要执行的命令。在 COPY FROM 中,输入是从命令的标准输出中读取的,在 COPY TO 中,输出被写入命令的标准输入中。

请注意,该命令由 shell 调用,因此如果您需要将来自不受信任来源的任何参数传递给 shell 命令,则必须小心去除或转义任何可能对 shell 具有特殊含义的特殊字符。出于安全原因,最好使用固定的命令字符串,或者至少避免在其中传递任何用户输入。

This was the missing piece that we needed for such an eagerly awaited functionality. For example, we could use this option in combination with cut(in a UNIX-based system) to select certain columns by order:

这是我们期待已久的功能所需要的缺失部分。例如,我们可以将此选项与cut(在基于 UNIX 的系统中)结合使用以按顺序选择某些列:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'cut -d "," -f 2,5,7,10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

However, cuthas several limitations when manipulating CSV's: it can't adequately manipulate strings with commas (or other delimeters) inside them and doesn't allow to select columns by name.

但是,cut在操作 CSV 时有几个限制:它不能充分操作其中包含逗号(或其他分隔符)的字符串,并且不允许按名称选择列。

There are several other open source command-line tools that are better at manipulating CSV files, such as csvkitor miller. Here's an example using millerto select columns by name:

还有其他几个开源命令行工具可以更好地处理 CSV 文件,例如csvkitmiller。这是一个使用miller按名称选择列的示例:

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

COPY my_table (x2, x5, x7, x10) FROM PROGRAM 'mlr --csv lf cut -f x2,x5,x7,x10 /path/to/file.csv' WITH (FORMAT CSV, HEADER)

回答by Chris Lawton

You could take James Brown's suggestion further and do, all in one line:

您可以进一步采纳 James Brown 的建议,并在一行中完成:

$ awk -F ',' '{print ","","","}' file | psql -d db -c "\copy MyTable from STDIN csv header"

回答by Michael Kraxner

If the number of imported rows is not important for you as result, you could also:

如果导入的行数对您来说并不重要,您还可以:

create two tables:

创建两个表:

  • t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):with all the columns of the csv file
  • t2 (x2 x5 x7 x10): as you need it
  • t1 (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10):包含 csv 文件的所有列
  • t2 (x2 x5 x7 x10):根据您的需要

then create:

然后创建:

  • a trigger function, where you insert the desired columns into t2 instead and return NULL to prevent this row being inserted in t1

  • a trigger for t1 (BEFORE INSERT FOR EACH ROW) that calls this function.

  • 一个触发器函数,您可以在其中将所需的列插入到 t2 中并返回 NULL 以防止将此行插入到 t1 中

  • 调用此函数的 t1 (BEFORE INSERT FOR EACH ROW) 的触发器。

Especially with larger csv files BEFORE INSERT triggers are also useful to filter out rows with certain properties beforehand, and you can do type conversions as well.

特别是对于较大的 csv 文件,BEFORE INSERT 触发器也可用于预先过滤掉具有某些属性的行,并且您也可以进行类型转换。

回答by Sagun

To load data from spreadsheet (Excel or OpenOffice Calc) into postgreSQL:

要将数据从电子表格(Excel 或 OpenOffice Calc)加载到 postgreSQL:

Save the spreadsheet page as a CSV file. Prefered method is to open the spreadsheet on OpenOffice Calc and do the saving. On “Export to text file” window choose Character Set as Unicode (UTF8), Field Delimiter: “,” and Text Delimiter “ “ “. Message will be displayed saying only active sheet is saved. Note: This file has to be saved on a folder but not on desktop and have to save in UTF8 format (postgreSQL by dafault is step up for UTF8 encoding). If saved on desktop, postgreSQL will give “access denied” message and won't upload.

将电子表格页面另存为 CSV 文件。首选方法是在 OpenOffice Calc 上打开电子表格并进行保存。在“导出到文本文件”窗口中,选择字符集为 Unicode (UTF8)、字段分隔符:“、”和文本分隔符““”。将显示消息,说明仅保存活动工作表。注意:此文件必须保存在文件夹中,但不能保存在桌面上,并且必须以 UTF8 格式保存(dafault 的 postgreSQL 是针对 UTF8 编码的)。如果保存在桌面上,postgreSQL 将给出“访问被拒绝”消息并且不会上传。

In PostgreSQL, create an empty table with same number of column as the spreadsheet.

在 PostgreSQL 中,创建一个与电子表格具有相同列数的空表。

Note: On each column, column-name has to be same, data type has to be same. Also, keep in mind the length of data where character varying with enough field.

注意:在每一列上,column-name 必须相同,数据类型必须相同。另外,请记住字符随足够字段变化的数据长度。

Then on postgreSQL, on SQL window, put the code:

然后在 postgreSQL 上,在 SQL 窗口中,输入以下代码:

copy "ABC"."def" from E'C:\\tmp\\blabla.csv' delimiters ',' CSV HEADER;

复制 "ABC"."def" 从 E'C:\\tmp\\blabla.csv' 分隔符 ',' CSV HEADER;

NOTE: Here C:\\tmp is the folder where CSV-file “blabla” is saved. “ABC”.”def” is the table created on postgreSQL where "ABC" is schema and"def" is the actual table. Then do “execute query” by pressing the green button on top. “CSV HEADER” is needed when CSV table has heading at the start of every column.

注意:这里 C:\\tmp 是保存 CSV 文件“blabla”的文件夹。“ABC”.“def”是在 postgreSQL 上创建的表,其中“ABC”是模式,“def”是实际表。然后按顶部的绿色按钮执行“执行查询”。当 CSV 表在每一列的开头都有标题时,需要“CSV HEADER”。

If everythig is ok, no error message will be displayed and table data from CSV file will be loaded into the postgreSQL table. But if there is an error message do as following:

如果一切正常,则不会显示错误消息,并且 CSV 文件中的表数据将加载到 postgreSQL 表中。但如果有错误消息,请执行以下操作:

If error message is saying that the data is too long for a specific column, then increase the column size. This happens mostly on character and character varying column. Then run the “execute query” command again.

如果错误消息指出特定列的数据太长,则增加列大小。这主要发生在字符和字符变化列上。然后再次运行“执行查询”命令。

If error message is saying that the data type doesn't match to a particular column, then change the data type on postgreSQL table-column to match the one in CSV table.

如果错误消息表明数据类型与特定列不匹配,则更改 postgreSQL 表列上的数据类型以匹配 CSV 表中的数据类型。

In your case, after creating CSV file, delete the unwanted columns and match the columns in postgre table.

在您的情况下,在创建 CSV 文件后,删除不需要的列并匹配 postgre 表中的列。