如何联接表中的最新行?
我经常遇到这种形式的问题,但还没有找到好的解决方案:
假设我们有两个代表电子商务系统的数据库表。
userData (userId, name, ...) orderData (orderId, userId, orderType, createDate, ...)
对于系统中的所有用户,选择他们的用户信息,类型='1'的最新订单信息以及类型='2'的最新订单信息。我想在一个查询中执行此操作。这是一个示例结果:
(userId, name, ..., orderId1, orderType1, createDate1, ..., orderId2, orderType2, createDate2, ...) (101, 'Bob', ..., 472, '1', '4/25/2008', ..., 382, '2', '3/2/2008', ...)
解决方案
他们的最新意思是当天的所有新鲜事物吗?如果createDate> =当日,我们始终可以查询createDate并获取所有用户和订单数据。
SELECT * FROM "orderData", "userData" WHERE "userData"."userId" ="orderData"."userId" AND "orderData".createDate >= current_date;
更新
这是我们在这里发表评论后想要的:
SELECT * FROM "orderData", "userData" WHERE "userData"."userId" ="orderData"."userId" AND "orderData".type = '1' AND "orderData"."orderId" = ( SELECT "orderId" FROM "orderData" WHERE "orderType" = '1' ORDER "orderId" DESC LIMIT 1
)
我们也许可以对此执行联合查询。确切的语法需要做一些工作,尤其是按组分组,但是联合应该能够做到。
例如:
SELECT orderId, orderType, createDate FROM orderData WHERE type=1 AND MAX(createDate) GROUP BY orderId, orderType, createDate UNION SELECT orderId, orderType, createDate FROM orderData WHERE type=2 AND MAX(createDate) GROUP BY orderId, orderType, createDate
抱歉,我面前没有oracle,但这是我在oracle中要做的基本结构:
SELECT b.user_id, b.orderid, b.orderType, b.createDate, <etc>, a.name FROM orderData b, userData a WHERE a.userid = b.userid AND (b.userid, b.orderType, b.createDate) IN ( SELECT userid, orderType, max(createDate) FROM orderData WHERE orderType IN (1,2) GROUP BY userid, orderType)
T-SQL示例解决方案(MS SQL):
SELECT u.* , o1.* , o2.* FROM ( SELECT , userData.* , (SELECT TOP 1 orderId.url FROM orderData WHERE orderData.userId=userData.userId AND orderType=1 ORDER BY createDate DESC) AS order1Id , (SELECT TOP 1 orderId.url FROM orderData WHERE orderData.userId=userData.userId AND orderType=2 ORDER BY createDate DESC) AS order2Id FROM userData ) AS u LEFT JOIN orderData o1 ON (u.order1Id=o1.orderId) LEFT JOIN orderData o2 ON (u.order2Id=o2.orderId)
在SQL 2005中,我们也可以使用RANK()OVER函数。 (但是AFAIK完全是MSSQL特有的功能)
我在MySQL中使用类似的方法:
SELECT u.*, SUBSTRING_INDEX( MAX( CONCAT( o1.createDate, '##', o1.otherfield)), '##', -1) as o2_orderfield, SUBSTRING_INDEX( MAX( CONCAT( o2.createDate, '##', o2.otherfield)), '##', -1) as o2_orderfield FROM userData as u LEFT JOIN orderData AS o1 ON (o1.userId=u.userId AND o1.orderType=1) LEFT JOIN orderData AS o2 ON (o1.userId=u.userId AND o2.orderType=2) GROUP BY u.userId
简而言之,通过将标准字段(createDate)放在感兴趣的字段(otherfield)上,使用MAX()获得最新的内容。 SUBSTRING_INDEX()然后删除日期。
OTOH,如果我们需要任意数量的订单(如果userType可以是任意数量,而不是有限的ENUM);最好使用单独的查询来处理,如下所示:
select * from orderData where userId=XXX order by orderType, date desc group by orderType
对于每个用户。
假设orderId随着时间单调增加:
SELECT * FROM userData u INNER JOIN orderData o ON o.userId = u.userId INNER JOIN ( -- This subquery gives the last order of each type for each customer SELECT MAX(o2.orderId) --, o2.userId -- optional - include if joining for a particular customer --, o2.orderType -- optional - include if joining for a particular type FROM orderData o2 GROUP BY o2.userId ,o2.orderType ) AS LastOrders ON LastOrders.orderId = o.orderId -- expand join to include customer or type if desired
然后在客户端旋转,或者如果使用SQL Server,则具有PIVOT功能
这应该可行,我们必须调整表/列名称:
select ud.name, order1.order_id, order1.order_type, order1.create_date, order2.order_id, order2.order_type, order2.create_date from user_data ud, order_data order1, order_data order2 where ud.user_id = order1.user_id and ud.user_id = order2.user_id and order1.order_id = (select max(order_id) from order_data od1 where od1.user_id = ud.user_id and od1.order_type = 'Type1') and order2.order_id = (select max(order_id) from order_data od2 where od2.user_id = ud.user_id and od2.order_type = 'Type2')
对数据进行非规范化也是一个好主意。这种事情做起来会很昂贵。因此,我们可以在用户数据中添加一个" last_order_date"。
这是将类型1和2数据移动到同一行的一种方法:
(通过将类型1和类型2信息放入它们自己的选择中,然后在from子句中使用它。)
SELECT a.name, ud1.*, ud2.* FROM userData a, (SELECT user_id, orderid, orderType, reateDate, <etc>, FROM orderData b WHERE (userid, orderType, createDate) IN ( SELECT userid, orderType, max(createDate) FROM orderData WHERE orderType = 1 GROUP BY userid, orderType) ud1, (SELECT user_id, orderid, orderType, createDate, <etc>, FROM orderData WHERE (userid, orderType, createDate) IN ( SELECT userid, orderType, max(createDate) FROM orderData WHERE orderType = 2 GROUP BY userid, orderType) ud2
这是我的方法。这是标准的SQL,可在任何品牌的数据库中使用。
SELECT u.userId, u.name, o1.orderId, o1.orderType, o1.createDate, o2.orderId, o2.orderType, o2.createDate FROM userData AS u LEFT OUTER JOIN ( SELECT o1a.orderId, o1a.userId, o1a.orderType, o1a.createDate FROM orderData AS o1a LEFT OUTER JOIN orderData AS o1b ON (o1a.userId = o1b.userId AND o1a.orderType = o1b.orderType AND o1a.createDate < o1b.createDate) WHERE o1a.orderType = 1 AND o1b.orderId IS NULL) AS o1 ON (u.userId = o1.userId) LEFT OUTER JOIN ( SELECT o2a.orderId, o2a.userId, o2a.orderType, o2a.createDate FROM orderData AS o2a LEFT OUTER JOIN orderData AS o2b ON (o2a.userId = o2b.userId AND o2a.orderType = o2b.orderType AND o2a.createDate < o2b.createDate) WHERE o2a.orderType = 2 AND o2b.orderId IS NULL) o2 ON (u.userId = o2.userId);
请注意,如果我们拥有日期等于最新日期的两种类型的多个订单,则结果集中将获得多行。如果我们同时拥有两种类型的多个订单,则结果集中将获得N x M行。因此,我建议我们在单独的查询中获取每种类型的行。
史蒂夫·克(Steve K)绝对正确,谢谢!我确实重写了他的答案,以解释某个特定类型可能没有订单的事实(我没有提及,所以我不能指责史蒂夫·K。)
这是我最后使用的方法:
select ud.name, order1.orderId, order1.orderType, order1.createDate, order2.orderId, order2.orderType, order2.createDate from userData ud left join orderData order1 on order1.orderId = (select max(orderId) from orderData od1 where od1.userId = ud.userId and od1.orderType = '1') left join orderData order2 on order2.orderId = (select max(orderId) from orderData od2 where od2.userId = ud.userId and od2.orderType = '2') where ...[some limiting factors on the selection of users]...;
我提供了三种解决此问题的方法:
- 使用枢轴
- 使用案例陈述
- 在where子句中使用内联查询
所有解决方案均假设我们正在根据" orderId"列确定"最新"订单。由于时间戳冲突,使用createDate列会增加复杂性,并严重阻碍性能,因为createDate可能不是索引关键字的一部分。我仅使用MS SQL Server 2005测试了这些查询,所以我不知道它们是否可以在服务器上运行。
解决方案(1)和(2)几乎相同。实际上,它们都导致从数据库读取相同数量的数据。
处理大型数据集时,解决方案(3)不是首选方法。它始终使数百个逻辑读取超过(1)和(2)。当针对一个特定用户进行过滤时,方法(3)与其他方法相当。在单用户情况下,cpu时间的减少有助于抵消明显更高的读取次数。但是,随着磁盘驱动器变得更忙并且发生高速缓存未命中,这种轻微的优势将消失。
总结
对于提出的方案,如果DBMS支持,则使用数据透视方法。它比case语句所需的代码更少,并且简化了将来添加订单类型的过程。
请注意,在某些情况下,PIVOT不够灵活,使用case语句的特征值函数是可行的方法。
代码
使用PIVOT的方法(1):
select ud.userId, ud.fullname, od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1, od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2 from userData ud inner join ( select userId, [1] as typeOne, [2] as typeTwo from (select userId, orderType, orderId from orderData) as orders PIVOT ( max(orderId) FOR orderType in ([1], [2]) ) as LatestOrders) as LatestOrders on LatestOrders.userId = ud.userId inner join orderData od1 on od1.orderId = LatestOrders.typeOne inner join orderData od2 on od2.orderId = LatestOrders.typeTwo
使用案例陈述的方法(2):
select ud.userId, ud.fullname, od1.orderId as orderId1, od1.createDate as createDate1, od1.orderType as orderType1, od2.orderId as orderId2, od2.createDate as createDate2, od2.orderType as orderType2 from userData ud -- assuming not all users will have orders use outer join inner join ( select od.userId, -- can be null if no orders for type max (case when orderType = 1 then ORDERID else null end) as maxTypeOneOrderId, -- can be null if no orders for type max (case when orderType = 2 then ORDERID else null end) as maxTypeTwoOrderId from orderData od group by userId) as maxOrderKeys on maxOrderKeys.userId = ud.userId inner join orderData od1 on od1.ORDERID = maxTypeTwoOrderId inner join orderData od2 on OD2.ORDERID = maxTypeTwoOrderId
方法(3)在where子句中使用内联查询(基于Steve K.的响应):
select ud.userId,ud.fullname, order1.orderId, order1.orderType, order1.createDate, order2.orderId, order2.orderType, order2.createDate from userData ud, orderData order1, orderData order2 where ud.userId = order1.userId and ud.userId = order2.userId and order1.orderId = (select max(orderId) from orderData od1 where od1.userId = ud.userId and od1.orderType = 1) and order2.orderId = (select max(orderId) from orderData od2 where od2.userId = ud.userId and od2.orderType = 2)
生成表和1000个用户的脚本,每个用户有100个订单:
CREATE TABLE [dbo].[orderData]( [orderId] [int] IDENTITY(1,1) NOT NULL, [createDate] [datetime] NOT NULL, [orderType] [tinyint] NOT NULL, [userId] [int] NOT NULL ) CREATE TABLE [dbo].[userData]( [userId] [int] IDENTITY(1,1) NOT NULL, [fullname] [nvarchar](50) NOT NULL ) -- Create 1000 users with 100 order each declare @userId int declare @usersAdded int set @usersAdded = 0 while @usersAdded < 1000 begin insert into userData (fullname) values ('Mario' + ltrim(str(@usersAdded))) set @userId = @@identity declare @orderSetsAdded int set @orderSetsAdded = 0 while @orderSetsAdded < 10 begin insert into orderData (userId, createDate, orderType) values ( @userId, '01-06-08', 1) insert into orderData (userId, createDate, orderType) values ( @userId, '01-02-08', 1) insert into orderData (userId, createDate, orderType) values ( @userId, '01-08-08', 1) insert into orderData (userId, createDate, orderType) values ( @userId, '01-09-08', 1) insert into orderData (userId, createDate, orderType) values ( @userId, '01-01-08', 1) insert into orderData (userId, createDate, orderType) values ( @userId, '01-06-06', 2) insert into orderData (userId, createDate, orderType) values ( @userId, '01-02-02', 2) insert into orderData (userId, createDate, orderType) values ( @userId, '01-08-09', 2) insert into orderData (userId, createDate, orderType) values ( @userId, '01-09-01', 2) insert into orderData (userId, createDate, orderType) values ( @userId, '01-01-04', 2) set @orderSetsAdded = @orderSetsAdded + 1 end set @usersAdded = @usersAdded + 1 end
除SQL事件探查器外,用于测试MS SQL Server上查询性能的一小段代码:
-- Uncomment these to clear some caches --DBCC DROPCLEANBUFFERS --DBCC FREEPROCCACHE set statistics io on set statistics time on -- INSERT TEST QUERY HERE set statistics time off set statistics io off