我可以从带有标题的 csv 文件自动在 PostgreSQL 中创建表吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21018256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I automatically create a table in PostgreSQL from a csv file with headers?
提问by ihough
I'm running PostgreSQL 9.2.6 on OS X 10.6.8. I would like to import data from a CSV file with column headers into a database. I can do this with the COPY
statement, but only if I first manually create a table with a column for each column in the CSV file. Is there any way to automatically create this table based on the headers in the CSV file?
我在 OS X 10.6.8 上运行 PostgreSQL 9.2.6。我想将带有列标题的 CSV 文件中的数据导入数据库。我可以使用该COPY
语句执行此操作,但前提是我首先手动创建一个表,该表为 CSV 文件中的每一列都有一列。有什么方法可以根据 CSV 文件中的标题自动创建此表?
Per this questionI have tried
根据我尝试过的这个问题
COPY test FROM '/path/to/test.csv' CSV HEADER;
COPY test FROM '/path/to/test.csv' CSV HEADER;
But I just get this error:
但我只是收到这个错误:
ERROR: relation "test" does not exist
ERROR: relation "test" does not exist
And if I first create a table with no columns:
如果我首先创建一个没有列的表:
CREATE TABLE test ();
CREATE TABLE test ();
I get:
我得到:
ERROR: extra data after last expected column
ERROR: extra data after last expected column
I can't find anything in the PostgreSQL COPY documentationabout automatically creating a table. Is there some other way to automatically create a table from a CSV file with headers?
我在 PostgreSQL COPY 文档中找不到关于自动创建表的任何内容。有没有其他方法可以从带有标题的 CSV 文件自动创建表格?
采纳答案by Erwin Brandstetter
回答by Wolfi
There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.
有一个非常好的工具可以将表格从 csv 文件导入 Postgres。它是一个名为 pgfutter 的命令行工具(带有适用于 Windows、Linux 等的二进制文件)。它的一大优点是它还能识别属性/列名称。
The usage of the tool is simple. For example if you'd like to import myCSVfile.csv
:
该工具的使用很简单。例如,如果您想导入myCSVfile.csv
:
pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv
This will create a table (called myCSVfile
) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.
这将创建一个表(称为myCSVfile
),其中的列名取自 csv 文件的标题。此外,将从现有数据中识别数据类型。
A few notes: The command pgfutter
varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe
(rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd
and ensure pgfutter
is accessible). If you'd like to have a different table name add --table "myTable"
; to select a particular database schema us --schema "mySchema"
. In case you are accessing an external database use --host "myHostDomain"
.
一些注意事项: 命令pgfutter
因您使用的二进制文件而异,例如它可能是pgfutter_windows_amd64.exe
(如果您打算经常使用此命令,请重命名)。上述命令必须在命令行窗口中执行(例如在 Windows 中运行cmd
并确保pgfutter
可访问)。如果你想有一个不同的表名添加--table "myTable"
; 选择一个特定的数据库模式我们--schema "mySchema"
。如果您正在访问外部数据库,请使用--host "myHostDomain"
.
A more elaborate example of pgfutter
to import myFile
into myTable
is this one:
的更详细的例子pgfutter
来导入myFile
到myTable
是这一个:
pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv
Most likely you will change a few data types (from text to numeric) after the import:
导入后,您很可能会更改一些数据类型(从文本到数字):
alter table myTable
alter column myColumn type numeric
using (trim(myColumn)::numeric)
回答by Wolfi
There is a second approach, which I found here(from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).
我在这里找到了第二种方法(来自 mmatt)。基本上你在 Postgres 中调用一个函数(最后一个参数指定列数)。
select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)
Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)
这是mmatt的函数代码,我不得不稍微修改一下,因为我正在处理公共模式。(复制并粘贴到 PgAdmin SQL 编辑器并运行它以创建函数)
CREATE OR REPLACE FUNCTION load_csv_file(
target_table text,
csv_path text,
col_count integer)
RETURNS void AS
$BODY$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'public';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from temp_table limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
OWNER TO postgres;
Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.
注意:导入与编码相关的文本文件存在一个常见问题。csv 文件应为 UTF-8 格式。但是,有时尝试进行编码的程序并不能完全实现这一点。我通过在 Notepad++ 中打开文件并将其转换为 ANSI 再转换回 UTF8 来解决这个问题。
回答by Dan.faudemer
I am using csvsql
to generate the table layout (it will automatically guess the format):
我csvsql
用来生成表格布局(它会自动猜测格式):
head -n 20 table.csv | csvsql --no-constraints --tables table_name
And then I use \COPY
in psql
. That's for me the fastest way to import CSV file.
然后我\COPY
在psql
. 这对我来说是导入 CSV 文件的最快方法。
You can also use sed
with csvsql
in order to get the desired datatype:
您还可以使用sed
withcsvsql
来获取所需的数据类型:
head -n 20 table.csv | csvsql --no-constraints --tables table_name | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/'
回答by John
I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.
我没有使用过它,但是 pgfutter开发人员推荐pgLoader ( https://pgloader.io/) 来解决更复杂的问题(请参阅上面的答案)。看起来很能干。
回答by Franco
For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql onlineand choose one.
对于单个表,我通过可以在网上找到的许多优秀转换器之一在网上非常简单、快速和在线。只需谷歌在线将 csv 转换为 sql并选择一个。
回答by Alexandre Andrade
I achieved it with this steps:
我通过以下步骤实现了它:
- Convert the csv file to utf8
- 将 csv 文件转换为 utf8
iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv
- Use this python script to create the sql to create table and copy
- 使用这个python脚本创建sql来创建表和复制
#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify
origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)
d = open(destino,'w')
with open(origem,'r') as f:
header = f.readline().split(';')
head_cells = []
for cell in header:
value = slugify(cell,separator="_")
if value in head_cells:
value = value+'_2'
head_cells.append(value)
#cabecalho = "{}\n".format(';'.join(campos))
#print(cabecalho)
fields= []
for cell in head_cells:
fields.append(" {} text".format(cell))
table = origem.split('.')[0]
sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)
print(sql)
d.write(sql)
3.Run the script with
3.运行脚本
python3 importar.py
Optional: Edit the sql script to adjust the field types (all are text by default)
可选:编辑sql脚本调整字段类型(默认都是文本)
- Run the sql script. Short for console
- 运行sql脚本。控制台的简称
sudo -H -u postgres bash -c "psql mydatabase < file.sql"