使用 PostgreSQL COPY 时出现“错误:最后一个预期列之后的额外数据”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16367415/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"ERROR: extra data after last expected column" when using PostgreSQL COPY
提问by dnak
Please bear with me as this is my first post.
请耐心等待,因为这是我的第一篇文章。
I'm trying to run the COPYcommand in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:
我正在尝试在 PostgreSQL-9.2 中运行COPY命令以将制表符分隔的表从 .txt 文件添加到 PostgreSQL 数据库,例如:
COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');
I've already created an empty table called "raw_data" in the database using the SQL command:
我已经使用 SQL 命令在数据库中创建了一个名为“raw_data”的空表:
CREATE TABLE raw_data ();
I keep getting the following error message when trying to run the COPY
command:
尝试运行COPY
命令时,我不断收到以下错误消息:
ERROR: extra data after last expected column
CONTEXT: COPY raw_data, line 1: " 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ..."
(The numbers here are supposed to be the column headings)
(这里的数字应该是列标题)
I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.
我不确定是不是因为我在创建 db 表时没有指定表列,但我试图避免手动输入 800 或列。
Any suggestions on how to fix this?
对于如何解决这个问题,有任何的建议吗?
Here's an example of what the .txt file looks like:
以下是 .txt 文件的示例:
1 2 3 4 5 6 7 8 9
binary1 1 1 0 1 1 1 1 1 1
binary2 1 0 0 1 0 1 1 0 0
binary3 1 0 1 1 1 0 0 1 0
binary4 1 1 1 1 0 1 0 1 0
回答by Erwin Brandstetter
An empty table won't do. You need table that matches the structure of input data. Something like:
一张空桌子不行。您需要与输入数据结构匹配的表。就像是:
CREATE TABLE raw_data (
col1 int
, col2 int
...
);
You don't need to declare tab
as DELIMITER
since that's the default:
您不需要声明tab
为,DELIMITER
因为这是默认值:
COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';
800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLE
script.
你说800列?这么多列通常表明您的设计存在问题。无论如何,有一些方法可以使CREATE TABLE
脚本半自动化。
Automation
自动化
Assuming simplified raw data
假设简化的原始数据
1 2 3 4 -- first row contains "column names"
1 1 0 1 -- tab separated
1 0 0 1
1 0 1 1
Define a different DELIMITER
(one that does not occur in the import data at all), and import to a temporary staging table with a single text
column:
定义一个不同的DELIMITER
(在导入数据中根本不会出现的),并导入到一个单列的临时临时表text
:
CREATE TEMP TABLE tmp_data (raw text);
COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');
This query creates the CREATE TABLE
script:
此查询创建CREATE TABLE
脚本:
SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM (SELECT raw FROM tmp_data LIMIT 1) t;
A more generic & safer query:
更通用和更安全的查询:
SELECT 'CREATE TABLE tbl('
|| string_agg(quote_ident('col' || col), ' bool, ' ORDER BY ord)
|| ' bool);'
FROM (SELECT raw FROM tmp_data LIMIT 1) t
, unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);
Returns:
返回:
CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);
Execute after verifying validity - or execute dynamically if you trust the result:
在验证有效性后执行 - 或者如果您信任结果则动态执行:
DO
$$BEGIN
EXECUTE (
SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
FROM (SELECT raw FROM tmp_data LIMIT 1) t
);
END$$;
Then INSERT
the data with this query:
然后INSERT
使用此查询的数据:
INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
raw
, '1', 't')
, '0', 'f')
, E'\t', ',')
|| ')')::tbl).*
FROM (SELECT raw FROM tmp_data OFFSET 1) t;
Or simpler with translate()
:
或者更简单translate()
:
INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM (SELECT raw FROM tmp_data OFFSET 1) t;
The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*
.
该字符串被转换为行文字,转换为新创建的表行类型并使用(row).*
.
All done.
全做完了。
You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.
您可以将所有这些都放入一个 plpgsql 函数中,但您需要防止 SQL 注入。(SO 上有许多相关的解决方案。尝试搜索。
db<>fiddle here
Old SQL Fiddle
回答by Hugo Koopmans
you can create the table from the copy command directly, check out the HEADER option in COPY like: COPY FROM '/path/to/csv/SourceCSVFile.csv' DELIMITERS ',' CSV HEADER
您可以直接从复制命令创建表,请查看 COPY 中的 HEADER 选项,例如:COPY FROM '/path/to/csv/SourceCSVFile.csv' DELIMITERS ',' CSV HEADER