仅在 Oracle 中的一列上不同
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16350452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Distinct on one column only in Oracle
提问by Mark Tickner
I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:
我想在下表中使用 distinct,但仅限于“PlayerID”列。这就是我目前所拥有的:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 5 4 14-JAN-12 01-JUN-10
20 7 4 14-JAN-12 01-JUN-11
20 7 2 14-JAN-12 01-JUN-10
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:
这就是我想要的,以便显示每个“PlayerID”的最高“StartDate”,并忽略下一行:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 7 4 14-JAN-12 01-JUN-11
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
Current SQL:
当前 SQL:
SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;
It's an Oracle database.
这是一个 Oracle 数据库。
Thanks in advance.
提前致谢。
回答by Ed Gibbs
A few points...
几点...
Unless you're using the joins to
Made_Up_Of
andSeason
to filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.
What you're really after is the maximum
pf.StartDate
for eachPlayerID
, which is a nice fit for the analyticalROW_NUMBER()
function. ThePARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC
will basically assign the value1
to the row with each player's most recent sort date. The outer filters out all rows except those with the1
ranking.You can also assign rankings with the
RANK()
andDENSE_RANK()
analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, useROW_NUMBER()
instead.
除非您使用连接
Made_Up_Of
和Season
过滤行,否则您不需要这些表。我已经把它们排除在外了;如果您需要它们,您可以重新添加它们。Mark Tickner 是正确的,您应该使用 ANSI JOIN 语法。关于它的好处(除了标准之外)是它使连接逻辑与被连接的表正确。一旦你习惯了它,我想你会发现它更可取。
您真正想要的是
pf.StartDate
each的最大值PlayerID
,这非常适合分析ROW_NUMBER()
函数。在PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC
将基本值分配1
给每个球员的最新数据排序的行。外部过滤掉除具有1
排名的行之外的所有行。您还可以使用
RANK()
和DENSE_RANK()
分析功能分配排名,但如果玩家在最近的日期有平局,那么所有平局的日期都将排名第一,并且您将获得该玩家的多行。在这种情况下,您只需要每个玩家一行,请ROW_NUMBER()
改用。
Put it all together and you get this:
把它们放在一起,你会得到这个:
SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
SELECT
pi.MatchID,
pi.PlayerID,
t.TeamID,
m.MatchDate,
pf.StartDate,
ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
FROM Plays_In pi
INNER JOIN Match m ON pi.MatchID = m.MatchID
INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
INNER JOIN Team t ON pa.TeamID = t.TeamID
INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
WHERE pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID
One final point: based on the WHERE pi.MatchID = '&match_id'
it looks like you may be using PHP as your front end and the mysql
functions to do the query. If so, please look into mysqli
or PDO
instead, as they'll protect you from SQL Injection. The mysql
functions (which are officially deprecated) will not.
最后一点:基于WHERE pi.MatchID = '&match_id'
看起来您可能正在使用 PHP 作为前端和mysql
执行查询的函数。如果是这样,请查看mysqli
或PDO
改为,因为它们会保护您免受 SQL 注入。这些mysql
功能(已正式弃用)不会。
Addendum: More information about ROW_NUMBER
, with many thanks to @AndriyM.
附录:关于 的更多信息ROW_NUMBER
,非常感谢@AndriyM。
With ROW_NUMBER
, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1
, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:
使用ROW_NUMBER
,如果玩家有多于一行的最近日期,则只有其中一行会被指定为ROW_NUMBER = 1
,并且该行将或多或少地随机选择。这是一个示例,其中玩家的最近日期是 2013 年 5 月 1 日,并且该玩家有三行与此日期相关的内容:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013 <-- could be ROW_NUMBER = 1
101 1000 04/29/2013
105 1000 05/01/2013 <-- could be ROW_NUMBER = 1
102 1000 05/01/2013 <-- could be ROW_NUMBER = 1
107 1000 04/18/2013
Note that only oneof the rows above will be assigned ROW_NUMBER = 1
, and it can be any of them. Oracle will decide, not you.
请注意,只有一个以上的行会被分配ROW_NUMBER = 1
,并且它可以是任何人。Oracle 将做出决定,而不是您。
If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchID
will be used to determine the "true" ROW_NUMBER = 1
:
如果这种不确定性是一个问题,请按额外的列排序以获得明显的赢家。对于此示例,pi.MatchID
将使用最高值来确定 "true" ROW_NUMBER = 1
:
-- replace `ROW_NUMBER...` in the query above with this:
ROW_NUMBER() OVER (
PARTITION BY pi.PlayerID
ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank
Now if there's a tie for the highest pf.StartDate
, Oracle looks for the highest pi.MatchID
within the subset of rows with the highest pf.StartDate
. As it turns out, only one row satisfies this condition:
现在,如果最高 存在并列pf.StartDate
,Oracle 将在具有最高pi.MatchID
的行子集中pf.StartDate
查找最高。事实证明,只有一行满足这个条件:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013
101 1000 04/29/2013
105 1000 05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
-- most recent StartDate (5/1/2013)
102 1000 05/01/2013
107 1000 04/18/2013 <-- not considered: has the highest MatchID but isn't
-- in the subset with the most recent StartDate
回答by tylert
You could use the rank() function.
您可以使用 rank() 函数。
SELECT * FROM (
SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate,
rank() over (partition by pi.PlayerID order by m.MatchDate desc, rowid) as RNK
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
) WHERE RNK = 1
ORDER BY MatchID ASC, PlayerID ASC, StartDate DESC;
回答by Tse Ka Leong
Maybe using INTERSECT
and then find out the MAX(StartDate)
using GROUP BY
?
也许使用INTERSECT
然后找出MAX(StartDate)
使用GROUP BY
?