如何从 CSV 文件复制到带有 CSV 文件标题的 PostgreSQL 表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17662631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to copy from CSV file to PostgreSQL table with headers in CSV file?
提问by Stanley Cup Phil
I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.
我想将 CSV 文件复制到 Postgres 表。该表中大约有 100 列,因此如果不需要,我不想重写它们。
I am using the \copy table from 'table.csv' delimiter ',' csv;
command but without a table created I get ERROR: relation "table" does not exist
. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.
我正在使用该\copy table from 'table.csv' delimiter ',' csv;
命令,但没有创建表,我得到ERROR: relation "table" does not exist
. 如果我添加一个空白表,我不会出错,但没有任何反应。我尝试了两三次这个命令,没有输出或消息,但是当我通过 PGAdmin 检查时,该表没有更新。
Is there a way to import a table with headers included like I am trying to do?
有没有办法像我试图做的那样导入包含标题的表格?
回答by G. Cito
This worked. The first row had column names in it.
这奏效了。第一行中有列名。
COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER
回答by joelostblom
With the Python library pandas
, you can easily create column names and infer data types from a csv file.
使用 Python 库pandas
,您可以轻松地创建列名并从 csv 文件推断数据类型。
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)
The if_exists
parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace')
. This works for additional input file types as well, docs hereand here.
该if_exists
参数可以设置为替换或附加到现有表,例如df.to_sql('pandas_db', engine, if_exists='replace')
。这也适用于其他输入文件类型,这里和这里的文档。
回答by Peter Krauss
Alternative by terminal with no permission
未经许可的终端替代
The pg documentation at NOTESsay
The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.
该路径将被解释为相对于服务器进程的工作目录(通常是集群的数据目录),而不是客户端的工作目录。
So, gerally, using psql
or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...
因此,一般来说,使用psql
或 任何客户端,即使在本地服务器中,您也会遇到问题......而且,如果您要为其他用户表达 COPY 命令,例如。在 Github README 中,读者会遇到问题......
The only way to express relative path with client permissionsis using STDIN,
使用客户端权限表示相对路径的唯一方法是使用STDIN,
When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.
当指定 STDIN 或 STDOUT 时,数据通过客户端和服务器之间的连接传输。
as remembered here:
psql -h remotehost -d remote_mydb -U myuser -c \
"copy mytable (column1, column2) from STDIN with delimiter as ','" \
< ./relative_path/file.csv
回答by mehmet
I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:
我已经使用这个功能一段时间了,没有任何问题。您只需要提供 csv 文件中的列数,它将从第一行获取标题名称并为您创建表格:
create or replace function data.load_csv_file
(
target_table text, -- name of the table that will be created
csv_file_path text,
col_count integer
)
returns void
as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- to keep column names in each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'data';
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format ('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);
iter := 1;
col_first := (select col_1
from temp_table
limit 1);
-- update the column names based on the first row which has the column names
for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format ('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row // using quote_ident or %I does not work here!?
execute format ('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length (target_table) > 0 then
execute format ('alter table temp_table rename to %I', target_table);
end if;
end;
$$ language plpgsql;
回答by citynorman
You can use d6tstackwhich creates the table for you and is faster than pd.to_sql()because it uses native DB import commands. It supports Postgres as well as MYSQL and MS SQL.
您可以使用d6tstack为您创建表并且比 pd.to_sql() 更快,因为它使用本机数据库导入命令。它支持 Postgres 以及 MYSQL 和 MS SQL。
import pandas as pd
df = pd.read_csv('table.csv')
uri_psql = 'postgresql+psycopg2://usr:pwd@localhost/db'
d6tstack.utils.pd_to_psql(df, uri_psql, 'table')
It is also useful for importing multiple CSVs, solving data schema changes and/or preprocess with pandas (eg for dates) before writing to db, see further down in examples notebook
它也可用于导入多个 CSV,解决数据模式更改和/或在写入 db 之前使用 Pandas 进行预处理(例如日期),请参阅示例笔记本中的进一步内容
d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'),
apply_after_read=apply_fun).to_psql_combine(uri_psql, 'table')