SQL 如何加入到第一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2043259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:02:04  来源:igfitidea点击:

How to Join to first row

sqlsql-servertsqlsql-server-2000

提问by Ian Boyd

I'll use a concrete, but hypothetical, example.

我将使用一个具体但假设的例子。

Each Ordernormally has only one line item:

每个订单通常只有一个行项目

Orders:

订单:

OrderGUID   OrderNumber
=========   ============
{FFB2...}   STL-7442-1      
{3EC6...}   MPT-9931-8A

LineItems:

订单项:

LineItemGUID   Order ID Quantity   Description
============   ======== ========   =================================
{098FBE3...}   1        7          prefabulated amulite
{1609B09...}   2        32         spurving bearing

But occasionally there will be an order with two line items:

但偶尔会有一个包含两个行项目的订单:

LineItemID   Order ID    Quantity   Description
==========   ========    ========   =================================
{A58A1...}   6,784,329   5          pentametric fan
{0E9BC...}   6,784,329   5          differential girdlespring 

Normally when showing the orders to the user:

通常在向用户显示订单时:

SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID

I want to show the single item on the order. But with this occasional order containing two (or more) items, the orders would appearbe duplicated:

我想在订单上显示单个项目。但随着含有这种偶尔为了两个(或更多)的项目,订单会出现复制

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         spurving bearing
KSG-0619-81   5          panametric fan
KSG-0619-81   5          differential girdlespring

What I really want is to have SQL Server just pick one, as it will be good enough:

我真正想要的是让 SQL Server只选择一个,因为它已经足够好了

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan

If I get adventurous, I might show the user, an ellipsis to indicate that there's more than one:

如果我喜欢冒险,我可能会向用户显示一个省略号,以表明不止一个:

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan, ...

So the question is how to either

所以问题是如何

  • eliminate "duplicate" rows
  • only join to one of the rows, to avoid duplication
  • 消除“重复”行
  • 只加入其中一行,以避免重复

First attempt

第一次尝试

My first naive attempt was to only join to the "TOP 1" line items:

我的第一次天真尝试是只加入“ TOP 1”行项目:

SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN (
       SELECT TOP 1 LineItems.Quantity, LineItems.Description
       FROM LineItems
       WHERE LineItems.OrderID = Orders.OrderID) LineItems2
    ON 1=1

But that gives the error:

但这给出了错误:

The column or prefix 'Orders' does not
match with a table name or alias name
used in the query.

列或前缀“Orders”

查询中使用的表名或别名不匹配。

Presumably because the inner select doesn't see the outer table.

大概是因为内部选择没有看到外部表。

回答by Quassnoi

SELECT   Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM     Orders
JOIN     LineItems
ON       LineItems.LineItemGUID =
         (
         SELECT  TOP 1 LineItemGUID 
         FROM    LineItems
         WHERE   OrderID = Orders.OrderID
         )

In SQL Server 2005 and above, you could just replace INNER JOINwith CROSS APPLY:

在 SQL Server 2005 及更高版本中,您可以替换INNER JOINCROSS APPLY

SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
CROSS APPLY
        (
        SELECT  TOP 1 LineItems.Quantity, LineItems.Description
        FROM    LineItems
        WHERE   LineItems.OrderID = Orders.OrderID
        ) LineItems2

Please note that TOP 1without ORDER BYis not deterministic: this query you will get you one line item per order, but it is not defined which one will it be.

请注意,TOP 1没有ORDER BY不是确定性的:此查询将为每个订单获取一个行项目,但未定义将是哪一个。

Multiple invocations of the query can give you different line items for the same order, even if the underlying did not change.

多次调用查询可以为同一订单提供不同的行项目,即使底层没有改变。

If you want deterministic order, you should add an ORDER BYclause to the innermost query.

如果您想要确定性的顺序,您应该ORDER BY在最里面的查询中添加一个子句。

回答by Justin Fisher

I know this question was answered a while ago, but when dealing with large data sets, nested queries can be costly. Here is a different solution where the nested query will only be ran once, instead of for each row returned.

我知道这个问题不久前就有人回答了,但是在处理大型数据集时,嵌套查询的成本可能很高。这是一个不同的解决方案,其中嵌套查询只会运行一次,而不是针对返回的每一行。

SELECT 
  Orders.OrderNumber,
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders
  INNER JOIN (
    SELECT
      Orders.OrderNumber,
      Max(LineItem.LineItemID) AS LineItemID
    FROM
      Orders INNER JOIN LineItems
      ON Orders.OrderNumber = LineItems.OrderNumber
    GROUP BY Orders.OrderNumber
  ) AS Items ON Orders.OrderNumber = Items.OrderNumber
  INNER JOIN LineItems 
  ON Items.LineItemID = LineItems.LineItemID

回答by Tomalak

You could do:

你可以这样做:

SELECT 
  Orders.OrderNumber, 
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders INNER JOIN LineItems 
  ON Orders.OrderID = LineItems.OrderID
WHERE
  LineItems.LineItemID = (
    SELECT MIN(LineItemID) 
    FROM   LineItems
    WHERE  OrderID = Orders.OrderID
  )

This requires an index (or primary key) on LineItems.LineItemIDand an index on LineItems.OrderIDor it will be slow.

这需要一个索引(或主键)LineItems.LineItemID和一个索引,LineItems.OrderID否则会很慢。

回答by BornToCode

@Quassnoi answer is good, in some cases (especially if the outer table is big), a more efficient query might be with using windowed functions, like this:

@Quassnoi 答案很好,在某些情况下(特别是如果外部表很大),使用窗口函数可能更有效,如下所示:

SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
LEFT JOIN 
        (
        SELECT  LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER()
                OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum
        FROM    LineItems

        ) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1

Sometimes you just need to testwhich query gives better performance.

有时您只需要测试哪个查询可以提供更好的性能。

回答by avb

,Another aproach using common table expression:

,另一种使用公用表表达式的方法:

with firstOnly as (
    select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp
    FROM Orders
        join LineItems on Orders.OrderID = LineItems.OrderID
) select *
  from firstOnly
  where lp = 1

or, in the end maybe you would like to show all rows joined?

或者,最后也许您想显示所有连接的行?

comma separated version here:

逗号分隔版本在这里:

  select *
  from Orders o
    cross apply (
        select CAST((select l.Description + ','
        from LineItems l
        where l.OrderID = s.OrderID
        for xml path('')) as nvarchar(max)) l
    ) lines

回答by P. Olesen

From SQL Server 2012 and onwards I think this will do the trick:

从 SQL Server 2012 起,我认为这可以解决问题:

SELECT DISTINCT
    o.OrderNumber ,
    FIRST_VALUE(li.Quantity) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Quantity ,
    FIRST_VALUE(li.Description) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Description
FROM    Orders AS o
    INNER JOIN LineItems AS li ON o.OrderID = li.OrderID

回答by Abdullah Yousuf

Correlated sub queries are sub queries that depend on the outer query. It's like a for loop in SQL. The sub-query will run once for each row in the outer query:

相关子查询是依赖于外部查询的子查询。这就像 SQL 中的 for 循环。子查询将为外部查询中的每一行运行一次:

select * from users join widgets on widgets.id = (
    select id from widgets
    where widgets.user_id = users.id
    order by created_at desc
    limit 1
)

回答by Peter Radocchia

EDIT: nevermind, Quassnoi has a better answer.

编辑:没关系,Quassnoi 有更好的答案。

For SQL2K, something like this:

对于 SQL2K,是这样的:

SELECT 
  Orders.OrderNumber
, LineItems.Quantity
, LineItems.Description
FROM (  
  SELECT 
    Orders.OrderID
  , Orders.OrderNumber
  , FirstLineItemID = (
      SELECT TOP 1 LineItemID
      FROM LineItems
      WHERE LineItems.OrderID = Orders.OrderID
      ORDER BY LineItemID -- or whatever else
      )
  FROM Orders
  ) Orders
JOIN LineItems 
  ON LineItems.OrderID = Orders.OrderID 
 AND LineItems.LineItemID = Orders.FirstLineItemID

回答by Anand

My favorite way to run this query is with a not exists clause. I believe this is the most efficient way to run this sort of query:

我最喜欢运行这个查询的方法是使用不存在子句。我相信这是运行此类查询的最有效方法:

select o.OrderNumber,
       li.Quantity,
       li.Description
from Orders as o
inner join LineItems as li
on li.OrderID = o.OrderID
where not exists (
    select 1
    from LineItems as li_later
    where li_later.OrderID = o.OrderID
    and li_later.LineItemGUID > li.LineItemGUID
    )

But I have not tested this method against other methods suggested here.

但是我还没有针对这里建议的其他方法测试过这种方法。

回答by ernst

Tried the cross, works nicely, but takes slightly longer. Adjusted line columns to have max and added group which kept speed and dropped the extra record.

尝试了十字架,效果很好,但需要的时间稍长。调整行列以具有最大和添加的组,以保持速度并删除额外的记录。

Here's the adjusted query:

这是调整后的查询:

SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description)
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID
Group by Orders.OrderNumber