来自 CSV 的 PostgreSQL 副本,但缺少数据值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8347237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL copy from CSV with missing data values
提问by ugh
I'm trying to import a CSV file into PostgreSQL using COPY. It chokes when it hits a row where there are empty values, e.g. the second row below:
我正在尝试使用 COPY 将 CSV 文件导入 PostgreSQL。当它碰到有空值的行时会窒息,例如下面的第二行:
JAN-01-2001,1,2,3,4,5
JAN-02-2001,6,7,,,
JAN-01-2001,1,2,3,4,5
JAN-02-2001,6,7,,,
I've tried this COPY statement, as well as variants using NULL and QUOTE and havent found anything that works.
我已经尝试过这个 COPY 语句,以及使用 NULL 和 QUOTE 的变体,但还没有找到任何有效的方法。
COPY data FROM 'data.dat' USING DELIMITERS ',' CSV;
从 'data.dat' 使用分隔符 ',' CSV 复制数据;
Any suggestions? The data file is in a massive 22GB flat file, so I'd like to avoid editing it directly.
有什么建议?数据文件是一个巨大的 22GB 平面文件,所以我想避免直接编辑它。
回答by Kenaniah
I would suggest converting your numeric columns to text columns for the purposes of your import. The reason is that an empty string is not a valid numeric value. Change your numeric columns to text columns, import the CSV file, update the empty values to null or 0, and then change the column back to an integer.
出于导入的目的,我建议将您的数字列转换为文本列。原因是空字符串不是有效的数值。将数字列更改为文本列,导入 CSV 文件,将空值更新为 null 或 0,然后将该列更改回整数。
回答by Erwin Brandstetter
Your statement is suspicious:
你的说法是可疑的:
COPY data FROM 'data.dat' USING DELIMITERS ',' CSV;
DELIMITERS
was used in versions before 7.3. It is still supported in order not to break old code, but don't use it any more. The proper keyword is DELIMITER
. And you don't need to specify ,
at all as it is the default for FORMAT CSV
.
Also, I quote the manual here:
DELIMITERS
在 7.3 之前的版本中使用。为了不破坏旧代码,它仍然受支持,但不要再使用它。正确的关键字是DELIMITER
. 而且您根本不需要指定,
,因为它是FORMAT CSV
.
另外,我在这里引用手册:
filename
The absolute path nameof the input or output file. Windows users might need to use an
E''
string and double any backslashes used in the path name.
文档名称
输入或输出文件的绝对路径名。Windows 用户可能需要使用
E''
字符串并将路径名中使用的任何反斜杠加倍。
Bold emphasis mine. Replace 'data.dat'
with something like '/path/to/data.dat'
on UNIX or E'C:\\path\\to\\data.dat'
on Windows.
大胆强调我的。替换'data.dat'
为'/path/to/data.dat'
UNIX 或E'C:\\path\\to\\data.dat'
Windows 之类的东西。
For versions 7.3+ use:
对于 7.3+ 版本,请使用:
COPY data FROM '/path/to/data.dat' CSV
For versions 9.0+ use:
对于 9.0+ 版本,请使用:
COPY data FROM '/path/to/data.dat' (FORMAT CSV)
If you still get this error:
如果您仍然收到此错误:
ERROR: invalid input syntax for type numeric: CONTEXT: COPY data, line 13, column interval_2400:
ERROR: invalid input syntax for type numeric: CONTEXT: COPY data, line 13, column interval_2400:
Then, obviously, the source file does not match the structure of table data
. Have a look at your source file, go to line 13 and see what value is there for column interval_2400
. Chances are, it's not numeric. In particular, an empty string
(''
) is not allowed in columns of numeric type.
然后,很明显,源文件与 table 的结构不匹配data
。查看您的源文件,转到第 13 行并查看 column 的值interval_2400
。很有可能,它不是数字。特别是,数字类型的列中不允许使用empty string
( ''
)。
You can either fix the source fileor adapt the table definition:
您可以修复源文件或调整表定义:
ALTER TABLE data ALTER COLUMN interval_2400 TYPE text;
Or whatever type is more appropriate. Might be interval
, judging from the name. (But text
accepts almost anyinput values.)
或者任何类型更合适。可能是interval
,从名字来看。(但text
几乎接受任何输入值。)
Or, better yet, create a modified temporary file, COPY
to it, fix offending values, then INSERT into the target table, casting from text. See:
或者,更好的是,创建一个修改过的临时文件,COPY
修复有问题的值,然后插入目标表,从文本中进行转换。看:
回答by Sergio Belevskij
This is PostgreSQL bug - csv parser ignore last empty item and throw error - "PG::BadCopyFileFormat: ERROR: missing data for column".
这是 PostgreSQL 错误 - csv 解析器忽略最后一个空项目并抛出错误 - “PG::BadCopyFileFormat: ERROR: missing data for column”。
i'm use a stupid hack:
我正在使用一个愚蠢的黑客:
If last item is empty, simple add a one delimiter to end of string:
如果最后一项为空,只需在字符串末尾添加一个分隔符:
1,2,3
1,2,,
This add missed last item in row to import data.
这将添加丢失的行中的最后一项以导入数据。
回答by glyph
One additional caveat- Check the line number of the error and make sure it is not a blank row in the CSV file. That will cause postgres to throw the same error about missing values.
一个额外的警告 - 检查错误的行号并确保它不是 CSV 文件中的空白行。这将导致 postgres 抛出关于缺失值的相同错误。
回答by Lewis
Anyone coming here with smaller files: here's the easiest fix I've found to this and inconsistent number of delimiters in a csv.
任何带着较小文件来到这里的人:这是我发现的最简单的解决方法,并且 csv 中的分隔符数量不一致。
- Open your CSV.
- Ctrl + Shift + 8 (selects all data)
- Ctrl + h (opens find replace)
- Leave the find box blank so its looking for strings of 0 length.
- Enter a space in the replace with box.
- 打开您的 CSV。
- Ctrl + Shift + 8(选择所有数据)
- Ctrl + h(打开查找替换)
- 将查找框留空,以便查找长度为 0 的字符串。
- 在替换为框中输入一个空格。
This will loop through the whole CSV and force it to have the correct column count in delimiters ( , ) even if theres no data in that column.
这将遍历整个 CSV 并强制它在分隔符 ( , ) 中具有正确的列数,即使该列中没有数据。
If you're alright with Excel you can turn this into a macro too, so my macro (Ctrl + g) does this in one go. Creating a Macro
如果你对 Excel 没问题,你也可以把它变成一个宏,所以我的宏 (Ctrl + g) 一次性完成。创建宏