如何在所有表 (PostgreSQL) 中搜索特定值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5350088/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to search a specific value in all tables (PostgreSQL)?
提问by Sandro Munda
回答by Mike Sherrill 'Cat Recall'
How about dumping the contents of the database, then using grep
?
如何转储数据库的内容,然后使用grep
?
$ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp
$ grep United a.tmp
INSERT INTO countries VALUES ('US', 'United States');
INSERT INTO countries VALUES ('GB', 'United Kingdom');
The same utility, pg_dump, can include column names in the output. Just change --inserts
to --column-inserts
. That way you can search for specific column names, too. But if I were looking for column names, I'd probably dump the schema instead of the data.
相同的实用程序 pg_dump 可以在输出中包含列名。只需更改--inserts
为--column-inserts
. 这样您也可以搜索特定的列名称。但是如果我要查找列名,我可能会转储模式而不是数据。
$ pg_dump --data-only --column-inserts -U postgres your-db-name > a.tmp
$ grep country_code a.tmp
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('US', 'United States');
INSERT INTO countries (iso_country_code, iso_country_name) VALUES ('GB', 'United Kingdom');
回答by Daniel Vérité
Here's a pl/pgsql functionthat locates records where any column contains a specific value. It takes as arguments the value to search in text format, an array of table names to search into (defaults to all tables) and an array of schema names (defaults all schema names).
这是一个pl/pgsql 函数,用于定位任何列包含特定值的记录。它将以文本格式搜索的值、要搜索的表名称数组(默认为所有表)和模式名称数组(默认为所有模式名称)作为参数。
It returns a table structure with schema, name of table, name of column and pseudo-column ctid
(non-durable physical location of the row in the table, see System Columns)
它返回一个表结构,其中包含模式、表名、列名和伪列ctid
(表中行的非持久物理位置,请参阅系统列)
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
JOIN information_schema.table_privileges p ON
(t.table_name=p.table_name AND t.table_schema=p.table_schema
AND p.privilege_type='SELECT')
JOIN information_schema.schemata s ON
(s.schema_name=t.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND (c.table_schema=ANY(haystack_schema) OR haystack_schema='{}')
AND t.table_type='BASE TABLE'
LOOP
FOR rowctid IN
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
)
LOOP
-- uncomment next line to get some progress report
-- RAISE NOTICE 'hit in %.%', schemaname, tablename;
RETURN NEXT;
END LOOP;
END LOOP;
END;
$$ language plpgsql;
See also the version on githubbased on the same principle but adding some speed and reporting improvements.
另请参阅基于相同原理的github上的版本,但增加了一些速度和报告改进。
Examples of use in a test database:
在测试数据库中的使用示例:
- Search in all tables within public schema:
- 在公共架构中的所有表中搜索:
select * from search_columns('foobar'); schemaname | tablename | columnname | rowctid ------------+-----------+------------+--------- public | s3 | usename | (0,11) public | s2 | relname | (7,29) public | w | body | (0,2) (3 rows)
- Search in a specific table:
- 在特定表中搜索:
select * from search_columns('foobar','{w}'); schemaname | tablename | columnname | rowctid ------------+-----------+------------+--------- public | w | body | (0,2) (1 row)
- Search in a subset of tables obtained from a select:
- 在从选择获得的表的子集中搜索:
select * from search_columns('foobar', array(select table_name::name from information_schema.tables where table_name like 's%'), array['public']); schemaname | tablename | columnname | rowctid ------------+-----------+------------+--------- public | s2 | relname | (7,29) public | s3 | usename | (0,11) (2 rows)
- Get a result row with the corresponding base table and and ctid:
- 获取带有相应基表和 ctid 的结果行:
select * from public.w where ctid='(0,2)'; title | body | tsv -------+--------+--------------------- toto | foobar | 'foobar':2 'toto':1
Variants
变体
To test against a regular expression instead of strict equality, like grep, this part of the query:
SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L
may be changed to:
SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L
For case insensitive comparisons, you could write:
SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)
要测试正则表达式而不是严格相等,如 grep,查询的这一部分:
SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L
可以改为:
SELECT ctid FROM %I.%I WHERE cast(%I as text) ~ %L
对于不区分大小写的比较,您可以编写:
SELECT ctid FROM %I.%I WHERE lower(cast(%I as text)) = lower(%L)
回答by Erwin Brandstetter
to search every column of every table for a particular value
在每个表的每一列中搜索特定值
This does not define how to match exactly.
Nor does it define what to return exactly.
这并没有定义如何精确匹配。
它也没有定义准确返回什么。
Assuming:
假设:
- Find any row with any column containingthe given value in its text representation - as opposed to equalingthe given value.
- Return the table name (
regclass
) and the tuple ID (ctid
), because that's simplest.
- 查找任何行的任何列在其文本表示中包含给定值 - 而不是等于给定值。
- 返回表名 (
regclass
) 和元组 ID (ctid
),因为这是最简单的。
Here is a dead simple, fast and slightly dirty way:
这是一种非常简单、快速且略显脏乱的方法:
CREATE OR REPLACE FUNCTION search_whole_db(_like_pattern text)
RETURNS TABLE(_tbl regclass, _ctid tid) AS
$func$
BEGIN
FOR _tbl IN
SELECT c.oid::regclass
FROM pg_class c
JOIN pg_namespace n ON n.oid = relnamespace
WHERE c.relkind = 'r' -- only tables
AND n.nspname !~ '^(pg_|information_schema)' -- exclude system schemas
ORDER BY n.nspname, c.relname
LOOP
RETURN QUERY EXECUTE format(
'SELECT , ctid FROM %s t WHERE t::text ~~ %L'
, _tbl, '%' || _like_pattern || '%')
USING _tbl;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
称呼:
SELECT * FROM search_whole_db('mypattern');
Provide the search pattern without enclosing %
.
提供不包含%
.
Why slightly dirty?
为什么有点脏?
If separators and decorators for the row in text
representation can be part of the search pattern, there can be false positives:
如果text
表示中行的分隔符和装饰器可以是搜索模式的一部分,则可能会出现误报:
- column separator:
,
by default - whole row is enclosed in parentheses:
()
- some values are enclosed in double quotes
"
\
may be added as escape char
- 列分隔符:
,
默认 - 整行括在括号中:
()
- 一些值用双引号括起来
"
\
可以添加为转义字符
And the text representation of some columns may depend on local settings - but that ambiguity is inherent to the question, not to my solution.
并且某些列的文本表示可能取决于本地设置 - 但这种歧义是问题所固有的,而不是我的解决方案。
Each qualifying row is returned onceonly, even when it matches multiple times (as opposed to other answers here).
每个符合条件的行仅返回一次,即使它匹配多次(与此处的其他答案相反)。
This searches the whole DB except for system catalogs. Will typically take a long time to finish. You might want to restrict to certain schemas / tables (or even columns) like demonstrated in other answers. Or add notices and a progress indicator, also demonstrated in another answer.
这将搜索除系统目录之外的整个数据库。通常需要很长时间才能完成。您可能希望限制某些模式/表(甚至列),如其他答案中所示。或者添加通知和进度指示器,也在另一个答案中演示。
The regclass
object identifier type is represented as table name, schema-qualified where necessary to disambiguate according to the current search_path
:
的regclass
对象标识符类型被表示为表名,模式限定在必要时根据当前的歧义search_path
:
What is the ctid
?
是什么ctid
?
You might want to escape characters with special meaning in the search pattern. See:
您可能希望在搜索模式中转义具有特殊含义的字符。看:
回答by Daniel A. Martinhao
And if someone think it could help. Here is @Daniel Vérité's function, with another param that accept names of columns that can be used in search. This way it decrease the time of processing. At least in my test it reduced a lot.
如果有人认为它可以提供帮助。这是@Daniel Vérité 的函数,另一个参数接受可用于搜索的列名称。这样就减少了处理时间。至少在我的测试中它减少了很多。
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_columns name[] default '{}',
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}'
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
begin
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND (c.column_name=ANY(haystack_columns) OR haystack_columns='{}')
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
END IF;
END LOOP;
END;
$$ language plpgsql;
Bellow is an example of usage of the search_function created above.
Bellow 是上面创建的 search_function 的使用示例。
SELECT * FROM search_columns('86192700'
, array(SELECT DISTINCT a.column_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public'
)
, array(SELECT b.table_name::name FROM information_schema.columns AS a
INNER JOIN information_schema.tables as b ON (b.table_catalog = a.table_catalog AND b.table_schema = a.table_schema AND b.table_name = a.table_name)
WHERE
a.column_name iLIKE '%cep%'
AND b.table_type = 'BASE TABLE'
AND b.table_schema = 'public')
);
回答by profimedica
Without storing a new procedure you can use a code block and execute to obtain a table of occurences. You can filter results by schema, table or column name.
在不存储新过程的情况下,您可以使用代码块并执行以获取出现表。您可以按架构、表或列名称过滤结果。
DO $$
DECLARE
value int := 0;
sql text := 'The constructed select statement';
rec1 record;
rec2 record;
BEGIN
DROP TABLE IF EXISTS _x;
CREATE TEMPORARY TABLE _x (
schema_name text,
table_name text,
column_name text,
found text
);
FOR rec1 IN
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_name <> '_x'
AND UPPER(column_name) LIKE UPPER('%%')
AND table_schema <> 'pg_catalog'
AND table_schema <> 'information_schema'
AND data_type IN ('character varying', 'text', 'character', 'char', 'varchar')
LOOP
sql := concat('SELECT ', rec1."column_name", ' AS "found" FROM ',rec1."table_schema" , '.',rec1."table_name" , ' WHERE UPPER(',rec1."column_name" , ') LIKE UPPER(''','%my_substring_to_find_goes_here%' , ''')');
RAISE NOTICE '%', sql;
BEGIN
FOR rec2 IN EXECUTE sql LOOP
RAISE NOTICE '%', sql;
INSERT INTO _x VALUES (rec1."table_schema", rec1."table_name", rec1."column_name", rec2."found");
END LOOP;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END; $$;
SELECT * FROM _x;
回答by a_horse_with_no_name
There is a way to achieve this without creating a function or using an external tool. By using Postgres' query_to_xml()
function that can dynamically run a query inside another query, it's possible to search a text across many tables. This is based on my answer to retrieve the rowcount for all tables:
有一种方法可以在不创建函数或使用外部工具的情况下实现这一点。通过使用query_to_xml()
可以在另一个查询中动态运行查询的Postgres函数,可以在多个表中搜索文本。这是基于我检索所有表的行数的答案:
To search for the string foo
across all tables in a schema, the following can be used:
要foo
在模式中的所有表中搜索字符串,可以使用以下命令:
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
left join xmltable('//table/row'
passing table_rows
columns
table_row text path 'table_row') as x on true
Note that the use of xmltable
requires Postgres 10 or newer. For older Postgres version, this can be also done using xpath().
请注意,使用xmltable
需要 Postgres 10 或更新版本。对于较旧的 Postgres 版本,这也可以使用 xpath() 来完成。
with found_rows as (
select format('%I.%I', table_schema, table_name) as table_name,
query_to_xml(format('select to_jsonb(t) as table_row
from %I.%I as t
where t::text like ''%%foo%%'' ', table_schema, table_name),
true, false, '') as table_rows
from information_schema.tables
where table_schema = 'public'
)
select table_name, x.table_row
from found_rows f
cross join unnest(xpath('/table/row/table_row/text()', table_rows)) as r(data)
The common table expression (WITH ...
) is only used for convenience. It loops through all tables in the public
schema. For each table the following query is run through the query_to_xml()
function:
公用表表达式 ( WITH ...
) 只是为了方便而使用。它遍历public
模式中的所有表。对于每个表,通过query_to_xml()
函数运行以下查询:
select to_jsonb(t)
from some_table t
where t::text like '%foo%';
The where clause is used to make sure the expensive generation of XML content is only done for rows that contain the search string. This might return something like this:
where 子句用于确保仅对包含搜索字符串的行执行昂贵的 XML 内容生成。这可能会返回如下内容:
<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<table_row>{"id": 42, "some_column": "foobar"}</table_row>
</row>
</table>
The conversion of the complete row to jsonb
is done, so that in the result one could see which value belongs to which column.
完成了整行到的转换jsonb
,以便在结果中可以看到哪个值属于哪个列。
The above might return something like this:
以上可能会返回如下内容:
table_name | table_row
-------------+----------------------------------------
public.foo | {"id": 1, "some_column": "foobar"}
public.bar | {"id": 42, "another_column": "barfoo"}
Online example for Postgres 10+
回答by Ganesh
-- Below function will list all the tables which contain a specific string in the database
-- 下面的函数将列出数据库中包含特定字符串的所有表
select TablesCount(‘StringToSearch');
--Iterates through all the tables in the database
--遍历数据库中的所有表
CREATE OR REPLACE FUNCTION **TablesCount**(_searchText TEXT)
RETURNS text AS
$$ -- here start procedural part
DECLARE _tname text;
DECLARE cnt int;
BEGIN
FOR _tname IN SELECT table_name FROM information_schema.tables where table_schema='public' and table_type='BASE TABLE' LOOP
cnt= getMatchingCount(_tname,Columnames(_tname,_searchText));
RAISE NOTICE 'Count% ', CONCAT(' ',cnt,' Table name: ', _tname);
END LOOP;
RETURN _tname;
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification
-- Returns the count of tables for which the condition is met. -- For example, if the intended text exists in any of the fields of the table, -- then the count will be greater than 0. We can find the notifications -- in the Messages section of the result viewer in postgres database.
-- 返回满足条件的表的数量。-- 例如,如果想要的文本存在于表的任何字段中, -- 那么计数将大于 0。我们可以在 postgres 数据库结果查看器的 Messages 部分找到通知。
CREATE OR REPLACE FUNCTION **getMatchingCount**(_tname TEXT, _clause TEXT)
RETURNS int AS
$$
Declare outpt text;
BEGIN
EXECUTE 'Select Count(*) from '||_tname||' where '|| _clause
INTO outpt;
RETURN outpt;
END;
$$ LANGUAGE plpgsql;
--Get the fields of each table. Builds the where clause with all columns of a table.
--获取每个表的字段。使用表的所有列构建 where 子句。
CREATE OR REPLACE FUNCTION **Columnames**(_tname text,st text)
RETURNS text AS
$$ -- here start procedural part
DECLARE
_name text;
_helper text;
BEGIN
FOR _name IN SELECT column_name FROM information_schema.Columns WHERE table_name =_tname LOOP
_name=CONCAT('CAST(',_name,' as VarChar)',' like ','''%',st,'%''', ' OR ');
_helper= CONCAT(_helper,_name,' ');
END LOOP;
RETURN CONCAT(_helper, ' 1=2');
END;
$$ -- here finish procedural part
LANGUAGE plpgsql; -- language specification
回答by alexkovelsky
Here's @Daniel Vérité's function with progress reporting functionality. It reports progress in three ways:
这是带有进度报告功能的@Daniel Vérité 函数。它以三种方式报告进度:
- by RAISE NOTICE;
- by decreasing value of supplied {progress_seq} sequence from {total number of colums to search in} down to 0;
- by writing the progress along with found tables into text file, located in c:\windows\temp\{progress_seq}.txt.
- 通过 RAISE 通知;
- 通过将提供的 {progress_seq} 序列的值从 {total number of colums to search} 减少到 0;
- 通过将进度与找到的表一起写入位于 c:\windows\temp\{progress_seq}.txt 的文本文件中。
_
_
CREATE OR REPLACE FUNCTION search_columns(
needle text,
haystack_tables name[] default '{}',
haystack_schema name[] default '{public}',
progress_seq text default NULL
)
RETURNS table(schemaname text, tablename text, columnname text, rowctid text)
AS $$
DECLARE
currenttable text;
columnscount integer;
foundintables text[];
foundincolumns text[];
begin
currenttable='';
columnscount = (SELECT count(1)
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE')::integer;
PERFORM setval(progress_seq::regclass, columnscount);
FOR schemaname,tablename,columnname IN
SELECT c.table_schema,c.table_name,c.column_name
FROM information_schema.columns c
JOIN information_schema.tables t ON
(t.table_name=c.table_name AND t.table_schema=c.table_schema)
WHERE (c.table_name=ANY(haystack_tables) OR haystack_tables='{}')
AND c.table_schema=ANY(haystack_schema)
AND t.table_type='BASE TABLE'
LOOP
EXECUTE format('SELECT ctid FROM %I.%I WHERE cast(%I as text)=%L',
schemaname,
tablename,
columnname,
needle
) INTO rowctid;
IF rowctid is not null THEN
RETURN NEXT;
foundintables = foundintables || tablename;
foundincolumns = foundincolumns || columnname;
RAISE NOTICE 'FOUND! %, %, %, %', schemaname,tablename,columnname, rowctid;
END IF;
IF (progress_seq IS NOT NULL) THEN
PERFORM nextval(progress_seq::regclass);
END IF;
IF(currenttable<>tablename) THEN
currenttable=tablename;
IF (progress_seq IS NOT NULL) THEN
RAISE NOTICE 'Columns left to look in: %; looking in table: %', currval(progress_seq::regclass), tablename;
EXECUTE 'COPY (SELECT unnest(string_to_array(''Current table (column ' || columnscount-currval(progress_seq::regclass) || ' of ' || columnscount || '): ' || tablename || '\n\nFound in tables/columns:\n' || COALESCE(
(SELECT string_agg(c1 || '/' || c2, '\n') FROM (SELECT unnest(foundintables) AS c1,unnest(foundincolumns) AS c2) AS t1)
, '') || ''',''\n''))) TO ''c:\WINDOWS\temp\' || progress_seq || '.txt''';
END IF;
END IF;
END LOOP;
END;
$$ language plpgsql;