我可以从带有标题的 csv 文件自动在 PostgreSQL 中创建表吗？

Question

提问by ihough

I'm running PostgreSQL 9.2.6 on OS X 10.6.8. I would like to import data from a CSV file with column headers into a database. I can do this with the COPYstatement, but only if I first manually create a table with a column for each column in the CSV file. Is there any way to automatically create this table based on the headers in the CSV file?

我在 OS X 10.6.8 上运行 PostgreSQL 9.2.6。我想将带有列标题的 CSV 文件中的数据导入数据库。我可以使用该COPY语句执行此操作，但前提是我首先手动创建一个表，该表为 CSV 文件中的每一列都有一列。有什么方法可以根据 CSV 文件中的标题自动创建此表？

Per this questionI have tried

根据我尝试过的这个问题

COPY test FROM '/path/to/test.csv' CSV HEADER;

But I just get this error:

但我只是收到这个错误：

ERROR: relation "test" does not exist

And if I first create a table with no columns:

如果我首先创建一个没有列的表：

CREATE TABLE test ();

I get:

我得到：

ERROR: extra data after last expected column

I can't find anything in the PostgreSQL COPY documentationabout automatically creating a table. Is there some other way to automatically create a table from a CSV file with headers?

我在 PostgreSQL COPY 文档中找不到关于自动创建表的任何内容。有没有其他方法可以从带有标题的 CSV 文件自动创建表格？

Answer 1

采纳答案by Erwin Brandstetter

You can't find anything in the COPYdocumentation, because COPY cannotcreate a table for you.
You need to do that before you can COPYto it.

您在COPY文档中找不到任何内容，因为 COPY无法为您创建表。
你需要先做到这一点，然后才能COPY做到。

Answer 2

回答by Wolfi

There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.

有一个非常好的工具可以将表格从 csv 文件导入 Postgres。它是一个名为 pgfutter 的命令行工具（带有适用于 Windows、Linux 等的二进制文件）。它的一大优点是它还能识别属性/列名称。

The usage of the tool is simple. For example if you'd like to import myCSVfile.csv:

该工具的使用很简单。例如，如果您想导入myCSVfile.csv：

pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv

This will create a table (called myCSVfile) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.

这将创建一个表（称为myCSVfile），其中的列名取自 csv 文件的标题。此外，将从现有数据中识别数据类型。

A few notes: The command pgfuttervaries depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe(rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmdand ensure pgfutteris accessible). If you'd like to have a different table name add --table "myTable"; to select a particular database schema us --schema "mySchema". In case you are accessing an external database use --host "myHostDomain".

一些注意事项：命令pgfutter因您使用的二进制文件而异，例如它可能是pgfutter_windows_amd64.exe（如果您打算经常使用此命令，请重命名）。上述命令必须在命令行窗口中执行（例如在 Windows 中运行cmd并确保pgfutter可访问）。如果你想有一个不同的表名添加--table "myTable"; 选择一个特定的数据库模式我们--schema "mySchema"。如果您正在访问外部数据库，请使用--host "myHostDomain".

A more elaborate example of pgfutterto import myFileinto myTableis this one:

的更详细的例子pgfutter来导入myFile到myTable是这一个：

pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv

Most likely you will change a few data types (from text to numeric) after the import:

导入后，您很可能会更改一些数据类型（从文本到数字）：

alter table myTable
  alter column myColumn type numeric
    using (trim(myColumn)::numeric)

Answer 3

回答by Wolfi

There is a second approach, which I found here(from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).

我在这里找到了第二种方法（来自 mmatt）。基本上你在 Postgres 中调用一个函数（最后一个参数指定列数）。

select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)

Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)

这是mmatt的函数代码，我不得不稍微修改一下，因为我正在处理公共模式。（复制并粘贴到 PgAdmin SQL 编辑器并运行它以创建函数）

CREATE OR REPLACE FUNCTION load_csv_file(
    target_table text,
    csv_path text,
    col_count integer)
  RETURNS void AS
$BODY$

declare

iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'public';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);

    iter := 1;
    col_first := (select col_1 from temp_table limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row
    execute format('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length(target_table) > 0 then
        execute format('alter table temp_table rename to %I', target_table);
    end if;

end;

$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
  OWNER TO postgres;

Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.

注意：导入与编码相关的文本文件存在一个常见问题。csv 文件应为 UTF-8 格式。但是，有时尝试进行编码的程序并不能完全实现这一点。我通过在 Notepad++ 中打开文件并将其转换为 ANSI 再转换回 UTF8 来解决这个问题。

Answer 4

回答by Dan.faudemer

I am using csvsqlto generate the table layout (it will automatically guess the format):

我csvsql用来生成表格布局（它会自动猜测格式）：

head -n 20 table.csv | csvsql --no-constraints --tables table_name

And then I use \COPYin psql. That's for me the fastest way to import CSV file.

然后我\COPY在psql. 这对我来说是导入 CSV 文件的最快方法。

You can also use sedwith csvsqlin order to get the desired datatype:

您还可以使用sedwithcsvsql来获取所需的数据类型：

head -n 20 table.csv | csvsql --no-constraints --tables table_name  | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/'

Answer 5

回答by John

I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.

我没有使用过它，但是 pgfutter开发人员推荐pgLoader ( https://pgloader.io/) 来解决更复杂的问题（请参阅上面的答案）。看起来很能干。

Answer 6

回答by Franco

For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql onlineand choose one.

对于单个表，我通过可以在网上找到的许多优秀转换器之一在网上非常简单、快速和在线。只需谷歌在线将 csv 转换为 sql并选择一个。

Answer 7

回答by Alexandre Andrade

I achieved it with this steps:

我通过以下步骤实现了它：

Convert the csv file to utf8

将 csv 文件转换为 utf8

    iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv

Use this python script to create the sql to create table and copy

使用这个python脚本创建sql来创建表和复制

#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify

origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)

d = open(destino,'w')
with open(origem,'r') as f:

    header = f.readline().split(';')
    head_cells = []
    for cell in header:
        value = slugify(cell,separator="_")
        if value in head_cells:
            value = value+'_2'
        head_cells.append(value)
    #cabecalho = "{}\n".format(';'.join(campos))

    #print(cabecalho)
    fields= []
    for cell in head_cells:
        fields.append(" {} text".format(cell))
    table = origem.split('.')[0]
    sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
    sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)

    print(sql)
    d.write(sql)

3.Run the script with

3.运行脚本

python3 importar.py

Optional: Edit the sql script to adjust the field types (all are text by default)

可选：编辑sql脚本调整字段类型（默认都是文本）

Run the sql script. Short for console

运行sql脚本。控制台的简称

sudo -H -u postgres bash -c "psql mydatabase < file.sql"

我可以从带有标题的 csv 文件自动在 PostgreSQL 中创建表吗？

提问by ihough

采纳答案by Erwin Brandstetter

回答by Wolfi

回答by Wolfi

回答by Dan.faudemer

回答by John

回答by Franco

回答by Alexandre Andrade

相关推荐

最近更新

标签

我可以从带有标题的 csv 文件自动在 PostgreSQL 中创建表吗？

提问by ihough

采纳答案by Erwin Brandstetter

回答by Wolfi

回答by Wolfi

回答by Dan.faudemer

回答by John

回答by Franco

回答by Alexandre Andrade

相关推荐

如何判断 PostgreSQL 的 Autovacuum 是否在 UNIX 上运行？

postgresql PL/PgSQL：没有函数匹配给定的名称和参数类型。您可能需要添加显式类型转换

postgresql 如何创建具有默认值的枚举字段？

psql 无法通过 IP 和端口 5432 连接到 PostgreSQL 服务器（postmaster）？

相关推荐

最近更新

标签