使用 PostgreSQL COPY 时出现“错误：最后一个预期列之后的额外数据”

Question

提问by dnak

Please bear with me as this is my first post.

请耐心等待，因为这是我的第一篇文章。

I'm trying to run the COPYcommand in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:

我正在尝试在 PostgreSQL-9.2 中运行COPY命令以将制表符分隔的表从 .txt 文件添加到 PostgreSQL 数据库，例如：

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');

I've already created an empty table called "raw_data" in the database using the SQL command:

我已经使用 SQL 命令在数据库中创建了一个名为“raw_data”的空表：

CREATE TABLE raw_data ();

I keep getting the following error message when trying to run the COPYcommand:

尝试运行COPY命令时，我不断收到以下错误消息：

ERROR:  extra data after last expected column
CONTEXT:  COPY raw_data, line 1: "  1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  ..."

(The numbers here are supposed to be the column headings)

（这里的数字应该是列标题）

I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.

我不确定是不是因为我在创建 db 表时没有指定表列，但我试图避免手动输入 800 或列。

Any suggestions on how to fix this?

对于如何解决这个问题，有任何的建议吗？

Here's an example of what the .txt file looks like:

以下是 .txt 文件的示例：

        1   2   3   4   5   6   7   8   9
binary1 1   1   0   1   1   1   1   1   1
binary2 1   0   0   1   0   1   1   0   0
binary3 1   0   1   1   1   0   0   1   0
binary4 1   1   1   1   0   1   0   1   0

Answer 1

回答by Erwin Brandstetter

An empty table won't do. You need table that matches the structure of input data. Something like:

一张空桌子不行。您需要与输入数据结构匹配的表。就像是：

CREATE TABLE raw_data (
  col1 int
, col2 int
  ...
);

You don't need to declare tabas DELIMITERsince that's the default:

您不需要声明tab为，DELIMITER因为这是默认值：

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';

800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLEscript.

你说800列？这么多列通常表明您的设计存在问题。无论如何，有一些方法可以使CREATE TABLE脚本半自动化。

Automation

自动化

Assuming simplified raw data

假设简化的原始数据

1   2   3   4  -- first row contains "column names"
1   1   0   1  -- tab separated
1   0   0   1
1   0   1   1

Define a different DELIMITER(one that does not occur in the import data at all), and import to a temporary staging table with a single textcolumn:

定义一个不同的DELIMITER（在导入数据中根本不会出现的），并导入到一个单列的临时临时表text：

CREATE TEMP TABLE tmp_data (raw text);

COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');

This query creates the CREATE TABLEscript:

此查询创建CREATE TABLE脚本：

SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM   (SELECT raw FROM tmp_data LIMIT 1) t;

A more generic & safer query:

更通用和更安全的查询：

SELECT 'CREATE TABLE tbl('
    ||  string_agg(quote_ident('col' || col), ' bool, ' ORDER  BY ord)
    || ' bool);'
FROM  (SELECT raw FROM tmp_data LIMIT 1) t
     , unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);

Returns:

返回：

CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);

Execute after verifying validity - or execute dynamically if you trust the result:

在验证有效性后执行 - 或者如果您信任结果则动态执行：

DO
$$BEGIN
EXECUTE (
   SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
   FROM  (SELECT raw FROM tmp_data LIMIT 1) t
   );
END$$;

Then INSERTthe data with this query:

然后INSERT使用此查询的数据：

INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
                  raw
                , '1',   't')
                , '0',   'f')
                , E'\t', ',')
             || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

Or simpler with translate():

或者更简单translate()：

INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*.

该字符串被转换为行文字，转换为新创建的表行类型并使用(row).*.

All done.

全做完了。

You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.

您可以将所有这些都放入一个 plpgsql 函数中，但您需要防止 SQL 注入。（SO 上有许多相关的解决方案。尝试搜索。

db<>fiddle here
Old SQL Fiddle

db<>fiddle here
旧 SQL 小提琴

Answer 2

回答by Hugo Koopmans

you can create the table from the copy command directly, check out the HEADER option in COPY like: COPY FROM '/path/to/csv/SourceCSVFile.csv' DELIMITERS ',' CSV HEADER

您可以直接从复制命令创建表，请查看 COPY 中的 HEADER 选项，例如：COPY FROM '/path/to/csv/SourceCSVFile.csv' DELIMITERS ',' CSV HEADER

使用 PostgreSQL COPY 时出现“错误：最后一个预期列之后的额外数据”

提问by dnak

回答by Erwin Brandstetter

Automation

自动化

回答by Hugo Koopmans

相关推荐

最近更新

标签

使用 PostgreSQL COPY 时出现“错误：最后一个预期列之后的额外数据”

提问by dnak

回答by Erwin Brandstetter

Automation

自动化

回答by Hugo Koopmans

相关推荐

postgresql 在 Postgres/SQLAlchemy 上设置 application_name

postgresql resultSet.next() 抛出空指针异常

postgresql “选择更新”何时锁定和解锁？

PostgreSQL - 如何恢复非常大的数据库

相关推荐

最近更新

标签