SQL 将表内部连接到自身

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13688497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:25:31  来源:igfitidea点击:

Inner Join a Table to Itself

sqlsql-serversql-server-2008

提问by Dave Long

I have a table that uses two identifying columns, let's call them id and userid. ID is unique in every record, and userid is unique to the user but is in many records.

我有一个使用两个标识列的表,我们称它们为 id 和 userid。ID 在每条记录中都是唯一的,而 userid 对用户来说是唯一的但在许多记录中都是唯一的。

What I need to do is get a record for the User by userid and then join that record to the first record we have for the user. The logic of the query is as follows:

我需要做的是通过 userid 获取用户的记录,然后将该记录加入我们为用户拥有的第一条记录。查询逻辑如下:

SELECT v1.id, MIN(v2.id) AS entryid, v1.userid
FROM views v1
INNER JOIN views v2
  ON v1.userid = v2.userid

I'm hoping that I don't have to join the table to a subquery that handles the min() piece of the code as that seems to be quite slow.

我希望我不必将表加入处理 min() 代码段的子查询,因为这似乎很慢。

回答by ypercube??

I guess (it's not entirely clear) you want to find for every user, the rows of the table that have minimum id, so one row per user.

我猜(不完全清楚)您想为每个用户查找表中具有 minimum 的id行,因此每个用户只有一行。

In that case, you an use a subquery (a derived table) and join it to the table:

在这种情况下,您可以使用子查询(派生表)并将其连接到表中:

SELECT v.*
FROM views AS v
  JOIN
    ( SELECT userid, MIN(id) AS entryid
      FROM views
      GROUP BY userid
    ) AS vm
    ON  vm.userid = v.userid 
    AND vm.entryid = v.id ;

The above can also be written using a Common Table Expression (CTE), if you like them:

如果您喜欢,也可以使用通用表表达式 (CTE)编写上述内容:

; WITH vm AS
    ( SELECT userid, MIN(id) AS entryid
      FROM views
      GROUP BY userid
    )
  SELECT v.*
  FROM views AS v
    JOIN vm
      ON  vm.userid = v.userid 
      AND vm.entryid = v.id ;

Both would be quite efficient with an index on (userid, id).

使用 上的索引两者都会非常有效(userid, id)

With SQL-Server, you could write this using the ROW_NUMBER()window function:

使用 SQL-Server,您可以使用ROW_NUMBER()窗口函数编写此代码:

; WITH viewsRN AS
    ( SELECT *
           , ROW_NUMBER() OVER (PARTITION BY userid ORDER BY id) AS rn
      FROM views
    ) 
  SELECT *                      --- skipping the "rn" column
  FROM viewsRN
  WHERE rn = 1 ;

回答by KeithS

Well, to use the MINfunction along with non-aggregate columns, you'd have to group the statement. That's possible with the query you have... (EDIT based on additional info)

好吧,要将MIN函数与非聚合列一起使用,您必须对语句进行分组。这可能与您的查询有关...(根据附加信息进行编辑)

SELECT MIN(v2.id) AS entryid, v1.id, v1.userid
FROM views v1
INNER JOIN views v2
  ON v1.userid = v2.userid      
GROUP BY v1.id, v1.userid

... however if this is just a simple example and you're looking to pull more data with this query, it quickly becomes an unfeasible solution.

...但是,如果这只是一个简单的示例,并且您希望通过此查询获取更多数据,那么它很快就会成为一个不可行的解决方案。

What you seem to want is a list of all the user data in this view, with a link on each row leading back to the "first" record that exists for the same user. The above query will get you what you want, but there are much easier ways to determine the first record for each user:

您似乎想要的是此视图中所有用户数据的列表,每行都有一个链接,指向同一用户存在的“第一条”记录。上面的查询会得到你想要的东西,但是有更简单的方法来确定每个用户的第一条记录:

SELECT v1.id, v1.userid
FROM views v1
ORDER BY v1.userid, v1.id

The first record for each unique user is your "entry point". I think I understand why you want to do it the way you specified, and the first query I gave will be reasonably performant, but you'll have to consider whether not having to use the order by clause to get the correct answer is worth it.

每个唯一用户的第一条记录是您的“入口点”。我想我明白你为什么想按照你指定的方式来做,我给出的第一个查询的性能相当不错,但你必须考虑是否不必使用 order by 子句来获得正确的答案是值得的.

回答by van

edit-1:as pointed out in the comments, this solution also uses a sub-query. However, it does not use aggregate functions, which (depending on the database) might have a huge impact on the performance.

编辑 1:正如评论中所指出的,此解决方案还使用子查询。但是,它不使用聚合函数,聚合函数(取决于数据库)可能对性能产生巨大影响。



Can achieve without sub-query (see below). Obviously, an index on views.useridis of tremedous value for the performance.

无需子查询即可实现(见下文)。显然,索引对views.userid性能具有巨大的价值。

SELECT      v1.*
FROM        views v1
WHERE       v1.id = (
        SELECT  TOP 1 v2.id
        FROM    views v2
        WHERE   v2.userid = v1.userid
        ORDER BY v2.id ASC
    )