SQL 将表内部连接到自身
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13688497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Inner Join a Table to Itself
提问by Dave Long
I have a table that uses two identifying columns, let's call them id and userid. ID is unique in every record, and userid is unique to the user but is in many records.
我有一个使用两个标识列的表,我们称它们为 id 和 userid。ID 在每条记录中都是唯一的,而 userid 对用户来说是唯一的但在许多记录中都是唯一的。
What I need to do is get a record for the User by userid and then join that record to the first record we have for the user. The logic of the query is as follows:
我需要做的是通过 userid 获取用户的记录,然后将该记录加入我们为用户拥有的第一条记录。查询逻辑如下:
SELECT v1.id, MIN(v2.id) AS entryid, v1.userid
FROM views v1
INNER JOIN views v2
ON v1.userid = v2.userid
I'm hoping that I don't have to join the table to a subquery that handles the min() piece of the code as that seems to be quite slow.
我希望我不必将表加入处理 min() 代码段的子查询,因为这似乎很慢。
回答by ypercube??
I guess (it's not entirely clear) you want to find for every user, the rows of the table that have minimum id
, so one row per user.
我猜(不完全清楚)您想为每个用户查找表中具有 minimum 的id
行,因此每个用户只有一行。
In that case, you an use a subquery (a derived table) and join it to the table:
在这种情况下,您可以使用子查询(派生表)并将其连接到表中:
SELECT v.*
FROM views AS v
JOIN
( SELECT userid, MIN(id) AS entryid
FROM views
GROUP BY userid
) AS vm
ON vm.userid = v.userid
AND vm.entryid = v.id ;
The above can also be written using a Common Table Expression (CTE), if you like them:
如果您喜欢,也可以使用通用表表达式 (CTE)编写上述内容:
; WITH vm AS
( SELECT userid, MIN(id) AS entryid
FROM views
GROUP BY userid
)
SELECT v.*
FROM views AS v
JOIN vm
ON vm.userid = v.userid
AND vm.entryid = v.id ;
Both would be quite efficient with an index on (userid, id)
.
使用 上的索引两者都会非常有效(userid, id)
。
With SQL-Server, you could write this using the ROW_NUMBER()
window function:
使用 SQL-Server,您可以使用ROW_NUMBER()
窗口函数编写此代码:
; WITH viewsRN AS
( SELECT *
, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY id) AS rn
FROM views
)
SELECT * --- skipping the "rn" column
FROM viewsRN
WHERE rn = 1 ;
回答by KeithS
Well, to use the MIN
function along with non-aggregate columns, you'd have to group the statement. That's possible with the query you have... (EDIT based on additional info)
好吧,要将MIN
函数与非聚合列一起使用,您必须对语句进行分组。这可能与您的查询有关...(根据附加信息进行编辑)
SELECT MIN(v2.id) AS entryid, v1.id, v1.userid
FROM views v1
INNER JOIN views v2
ON v1.userid = v2.userid
GROUP BY v1.id, v1.userid
... however if this is just a simple example and you're looking to pull more data with this query, it quickly becomes an unfeasible solution.
...但是,如果这只是一个简单的示例,并且您希望通过此查询获取更多数据,那么它很快就会成为一个不可行的解决方案。
What you seem to want is a list of all the user data in this view, with a link on each row leading back to the "first" record that exists for the same user. The above query will get you what you want, but there are much easier ways to determine the first record for each user:
您似乎想要的是此视图中所有用户数据的列表,每行都有一个链接,指向同一用户存在的“第一条”记录。上面的查询会得到你想要的东西,但是有更简单的方法来确定每个用户的第一条记录:
SELECT v1.id, v1.userid
FROM views v1
ORDER BY v1.userid, v1.id
The first record for each unique user is your "entry point". I think I understand why you want to do it the way you specified, and the first query I gave will be reasonably performant, but you'll have to consider whether not having to use the order by clause to get the correct answer is worth it.
每个唯一用户的第一条记录是您的“入口点”。我想我明白你为什么想按照你指定的方式来做,我给出的第一个查询的性能相当不错,但你必须考虑是否不必使用 order by 子句来获得正确的答案是值得的.
回答by van
edit-1:as pointed out in the comments, this solution also uses a sub-query. However, it does not use aggregate functions, which (depending on the database) might have a huge impact on the performance.
编辑 1:正如评论中所指出的,此解决方案还使用子查询。但是,它不使用聚合函数,聚合函数(取决于数据库)可能对性能产生巨大影响。
Can achieve without sub-query (see below).
Obviously, an index on views.userid
is of tremedous value for the performance.
无需子查询即可实现(见下文)。显然,索引对views.userid
性能具有巨大的价值。
SELECT v1.*
FROM views v1
WHERE v1.id = (
SELECT TOP 1 v2.id
FROM views v2
WHERE v2.userid = v1.userid
ORDER BY v2.id ASC
)