PostgreSQL where all in array
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11468572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL where all in array
提问by pex
What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN
? After all it should behave like mongodb's $all.
实现必须匹配数组中的所有元素的子句的最简单和最快的方法是什么 - 使用时不仅仅是一个IN
?毕竟它应该表现得像mongodb 的 $all。
Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:
考虑conversation_users 是conversation_id 和user_id 之间的连接表的群组对话,我有这样的想法:
WHERE (conversations_users.user_id ALL IN (1,2))
UPDATE16.07.12
更新16.07.12
Adding more info about schema and case:
添加有关架构和案例的更多信息:
The join-table is rather simple:
Table "public.conversations_users" Column | Type | Modifiers | Storage | Description -----------------+---------+-----------+---------+------------- conversation_id | integer | | plain | user_id | integer | | plain |
A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.
In the end I am trying to figure out a ruby on rails
scope
that find's me a conversation depending on it's participants - e.g.:scope :between, ->(*users) { joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id)) }
连接表相当简单:
Table "public.conversations_users" Column | Type | Modifiers | Storage | Description -----------------+---------+-----------+---------+------------- conversation_id | integer | | plain | user_id | integer | | plain |
一个会话有多个用户,一个用户属于多个会话。为了在对话中找到所有用户,我使用了这个连接表。
最后,我试图找出一个轨道
scope
上的红宝石,它根据参与者的情况找到我的对话 - 例如:scope :between, ->(*users) { joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id)) }
UPDATE23.07.12
更新23.07.12
My question is about finding an exact match of people. Therefore:
我的问题是关于找到完全匹配的人。所以:
Conversation between (1,2,3)
won't match if querying for (1,2)
(1,2,3)
如果查询,之间的对话将不匹配(1,2)
回答by Alex Blakemore
Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.
假设连接表遵循良好的做法并定义了唯一的复合键,即防止重复行的约束,那么应该像下面的简单查询那样做。
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.
需要注意的是,最后的数字 2 是 user_ids 列表的长度。如果 user_id 列表改变长度,这显然需要改变。如果您不能假设您的连接表不包含重复项,请以某种可能的性能成本将“count(*)”更改为“count(distinct user_id)”。
This query finds all conversations that include all the specified users even ifthe conversation also includes additional users.
此查询会查找包含所有指定用户的所有对话,即使该对话还包括其他用户。
If you want only conversations with exactlythe specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.
如果您希望只与对话正是在指定的一组用户,一种方法是如下的where子句中使用嵌套子查询。请注意,第一行和最后一行与原始查询相同,只有中间两行是新的。
select conversation_id from conversations_users where user_id in (1, 2)
and conversation_id not in
(select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2
Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)
同样,如果您的数据库支持,您可以使用集差运算符。这是 Oracle 语法中的示例。(对于 Postgres 或 DB2,将关键字“minus”更改为“except”。)
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
minus
select conversation_id from conversations_users where user_id not in (1,2)
A good query optimizer shouldtreat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.
一个好的查询优化器应该同样对待最后两个变体,但请检查您的特定数据库以确保。例如,Oracle 11GR2 查询计划在应用减号运算符之前对两组对话 ID 进行排序,但跳过最后一个查询的排序步骤。因此,根据行数、核心数、缓存、索引等多种因素,任一查询计划都可能更快。
回答by hgmnz
I'm collapsing those users into an array. I'm also using a CTE (the thing in the WITH clause) to make this more readable.
我正在将这些用户折叠成一个数组。我还使用了 CTE(WITH 子句中的东西)来使其更具可读性。
=> select * from conversations_users ;
conversation_id | user_id
-----------------+---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
(6 rows)
=> WITH users_on_conversation AS (
SELECT conversation_id, array_agg(user_id) as users
FROM conversations_users
WHERE user_id in (1, 2) --filter here for performance
GROUP BY conversation_id
)
SELECT * FROM users_on_conversation
WHERE users @> array[1, 2];
conversation_id | users
-----------------+-------
1 | {1,2}
3 | {1,2}
(2 rows)
EDIT(Some resources)
编辑(一些资源)
回答by Erwin Brandstetter
While @Alex' answer with IN
and count()
is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
虽然@Alex的回答有IN
,并count()
可能是最简单的解决方案,我希望这个PL / pgSQL函数是速度快:
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
RETURNS SETOF conversations AS
$BODY$
DECLARE
_sql text := '
SELECT c.*
FROM conversations c';
i int;
BEGIN
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
JOIN conversations_users x' || i || ' USING (conversation_id)';
END LOOP;
_sql := _sql || '
WHERE TRUE';
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
AND x' || i || '.user_id = ' || i;
END LOOP;
/* uncomment for conversations with exact list of users and no more
_sql := _sql || '
AND NOT EXISTS (
SELECT 1
FROM conversations_users u
WHERE u.conversation_id = c.conversation_id
AND u.user_id <> ALL (_user_arr)
)
*/
-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Call:
称呼:
SELECT * FROM f_conversations_among_users('{1,2}')
The function dynamically builds executes a query of the form:
该函数动态构建执行以下形式的查询:
SELECT c.*
FROM conversations c
JOIN conversations_users x1 USING (conversation_id)
JOIN conversations_users x2 USING (conversation_id)
...
WHERE TRUE
AND x1.user_id = 1
AND x2.user_id = 2
...
This form performed best in an extensive test of queries for relational division.
You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.
您也可以在您的应用程序中构建查询,但我假设您想使用一个数组参数。此外,这可能是最快的。
Either query requires an indexlike the following to be fast:
任何一个查询都需要像下面这样的索引才能快速:
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
A multi-column primary (or unique) key on (user_id, conversation_id)
is just as well, but one on (conversation_id, user_id)
(like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE
多列主(或唯一)键 on(user_id, conversation_id)
也是如此,但一个 on (conversation_id, user_id)
(就像您可能拥有的那样!)会较差。您可以在上面的链接中找到简短的基本原理,或者在 dba.SE 上的此相关问题下找到综合评估
I also assume you have a primary key on conversations.conversation_id
.
我还假设您在conversations.conversation_id
.
Can you run a performance test with EXPLAIN ANALYZE
on @Alex' query and this function and report your findings?
你能用EXPLAIN ANALYZE
@Alex 的查询和这个函数运行性能测试并报告你的发现吗?
Note that both solutions find conversations where at leastthe users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).
请注意,这两种解决方案都会找到至少数组中的用户参与的对话 - 包括与其他用户的对话。
如果要排除这些,请取消注释我函数中的附加子句(或将其添加到任何其他查询中)。
Tell me if you need more explanation on the features of the function.
告诉我您是否需要对该函数的特性进行更多解释。
回答by Don Pflaster
This preserves ActiveRecord
objects.
这将保留ActiveRecord
对象。
In the below example, I want to know the time sheets which are associated with all codes in the array.
在下面的示例中,我想知道与数组中的所有代码关联的时间表。
codes = [8,9]
Timesheet.joins(:codes).select('count(*) as count, timesheets.*').
where('codes.id': codes).
group('timesheets.id').
having('count(*) = ?', codes.length)
You should have the full ActiveRecord
objects to work with. If you want it to be a true scope, you can just use your above example and pass in the results with .pluck(:id)
.
您应该拥有可以使用的完整ActiveRecord
对象。如果您希望它是一个真正的范围,您可以使用上面的示例并使用.pluck(:id)
.
回答by Bruno Peres
Based on @Alex Blakemore's answer, the equivalent Rails 4 scope on you Conversation
class would be:
根据@Alex Blakemore 的回答,您Conversation
班级的等效 Rails 4 范围是:
# Conversations exactly with users array
scope :by_users, -> (users) {
self.by_any_of_users(users)
.group("conversations.id")
.having("COUNT(*) = ?", users.length) -
joins(:conversations_users)
.where("conversations_users.user_id NOT IN (?)", users)
}
# generates an IN clause
scope :by_any_of_users, -> (users) { joins(:conversations_users).where(conversations_users: { user_id: users }).distinct }
Note you can optimize it instead of doing a Rails -
(minus) you could do a .where("NOT IN")
but that would be really complex to read.
请注意,您可以优化它而不是执行 Rails -
(减去),您可以执行 a.where("NOT IN")
但这会非常复杂。
回答by maniek
select id from conversations where not exists(
select * from conversations_users cu
where cu.conversation_id=conversations.id
and cu.user_id not in(1,2,3)
)
this can easily be made into a rails scope.
这可以很容易地制作成一个 rails 范围。
回答by Gordon Linoff
I am guessing that you don't really want to start messing with temporary tables.
我猜你真的不想开始弄乱临时表。
Your question was unclear as to whether you want conversations with exactly the set of users, or conversations with a superset. The following is for the superset:
您的问题不清楚是要与一组用户进行对话,还是与超集进行对话。以下是超集:
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u left outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null
For this query to work well, it assumes that you have indexes on user_id in both users and conversations_users.
为了让这个查询正常工作,它假设您在 users 和对话用户中都有关于 user_id 的索引。
For the exact set . . .
对于确切的集合。. .
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u full outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null and u.user_id is not null
回答by Madhivanan
create a mapping table with all possible values and use this
创建一个包含所有可能值的映射表并使用它
select
t1.col from conversations_users as t1
inner join mapping_table as map on t1.user_id=map.user_id
group by
t1.col
having
count(distinct conversations_users.user_id)=
(select count(distinct user_id) from mapping)
回答by Victor
Based on Alex Blakemore answer
基于 Alex Blakemore 的回答
select conversation_id
from conversations_users cu
where user_id in (1, 2)
group by conversation_id
having count(distinct user_id) = 2
I have found an alternative query with the same goal, finding the conversation_id of a conversation that contains user_1 and user_2 (ignoring aditional users)
我找到了一个具有相同目标的替代查询,找到包含 user_1 和 user_2 的对话的 session_id(忽略其他用户)
select *
from conversations_users cu1
where 2 = (
select count(distinct user_id)
from conversations_users cu2
where user_id in (1, 2) and cu1.conversation_id = cu2.conversation_id
)
It is slower according the analysis that postgres perform via explain query statement, and i guess that is true because there is more conditions beign evaluated, at least, for each row of the conversations_users
the subquery will get executed as it is correlated subquery. The possitive point with this query is that you aren't grouping, thus you can select aditional fields of the conversations_users table. In some situations (like mine) it could be handy.
根据 postgres 通过解释查询语句执行的分析,速度较慢,我猜这是真的,因为有更多的条件被评估,至少,对于conversations_users
子查询的每一行,因为它是相关的子查询,所以将被执行。此查询的优点是您没有分组,因此您可以选择对话用户表的附加字段。在某些情况下(比如我的),它可能很方便。