仅在 Oracle 中的一列上不同

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16350452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 01:37:54  来源:igfitidea点击:

Distinct on one column only in Oracle

sqldatabaseoracle

提问by Mark Tickner

I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:

我想在下表中使用 distinct,但仅限于“PlayerID”列。这就是我目前所拥有的:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          5          4 14-JAN-12 01-JUN-10
        20          7          4 14-JAN-12 01-JUN-11
        20          7          2 14-JAN-12 01-JUN-10
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:

这就是我想要的,以便显示每个“PlayerID”的最高“StartDate”,并忽略下一行:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          7          4 14-JAN-12 01-JUN-11
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

Current SQL:

当前 SQL:

SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;

It's an Oracle database.

这是一个 Oracle 数据库。

Thanks in advance.

提前致谢。

回答by Ed Gibbs

A few points...

几点...

  • Unless you're using the joins to Made_Up_Ofand Seasonto filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.

  • Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.

  • What you're really after is the maximum pf.StartDatefor each PlayerID, which is a nice fit for the analytical ROW_NUMBER()function. The PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESCwill basically assign the value 1to the row with each player's most recent sort date. The outer filters out all rows except those with the 1ranking.

  • You can also assign rankings with the RANK()and DENSE_RANK()analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, use ROW_NUMBER()instead.

  • 除非您使用连接Made_Up_OfSeason过滤行,否则您不需要这些表。我已经把它们排除在外了;如果您需要它们,您可以重新添加它们。

  • Mark Tickner 是正确的,您应该使用 ANSI JOIN 语法。关于它的好处(除了标准之外)是它使连接逻辑与被连接的表正确。一旦你习惯了它,我想你会发现它更可取。

  • 您真正想要的是pf.StartDateeach的最大值PlayerID,这非常适合分析ROW_NUMBER()函数。在PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC将基本值分配1给每个球员的最新数据排序的行。外部过滤掉除具有1排名的行之外的所有行。

  • 您还可以使用RANK()DENSE_RANK()分析功能分配排名,但如果玩家在最近的日期有平局,那么所有平局的日期都将排名第一,并且您将获得该玩家的多行。在这种情况下,您只需要每个玩家一行,请ROW_NUMBER()改用。

Put it all together and you get this:

把它们放在一起,你会得到这个:

SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
  SELECT
    pi.MatchID,
    pi.PlayerID,
    t.TeamID,
    m.MatchDate,
    pf.StartDate,
    ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
  FROM Plays_In pi
  INNER JOIN Match m ON pi.MatchID = m.MatchID
  INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
  INNER JOIN Team t ON pa.TeamID = t.TeamID
  INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
  WHERE pi.MatchID = '&match_id'
  AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID

One final point: based on the WHERE pi.MatchID = '&match_id'it looks like you may be using PHP as your front end and the mysqlfunctions to do the query. If so, please look into mysqlior PDOinstead, as they'll protect you from SQL Injection. The mysqlfunctions (which are officially deprecated) will not.

最后一点:基于WHERE pi.MatchID = '&match_id'看起来您可能正在使用 PHP 作为前端和mysql执行查询的函数。如果是这样,请查看mysqliPDO改为,因为它们会保护您免受 SQL 注入。这些mysql功能(已正式弃用)不会。



Addendum: More information about ROW_NUMBER, with many thanks to @AndriyM.

附录:关于 的更多信息ROW_NUMBER,非常感谢@AndriyM。

With ROW_NUMBER, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:

使用ROW_NUMBER,如果玩家有多于一行的最近日期,则只有其中一行会被指定为ROW_NUMBER = 1,并且该行将或多或少地随机选择。这是一个示例,其中玩家的最近日期是 2013 年 5 月 1 日,并且该玩家有三行与此日期相关的内容:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      102         1000   05/01/2013 <-- could be ROW_NUMBER = 1 
      107         1000   04/18/2013

Note that only oneof the rows above will be assigned ROW_NUMBER = 1, and it can be any of them. Oracle will decide, not you.

请注意,只有一个以上的行会被分配ROW_NUMBER = 1,并且它可以是任何人。Oracle 将做出决定,而不是您。

If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchIDwill be used to determine the "true" ROW_NUMBER = 1:

如果这种不确定性是一个问题,请按额外的列排序以获得明显的赢家。对于此示例,pi.MatchID将使用最高值来确定 "true" ROW_NUMBER = 1

-- replace `ROW_NUMBER...` in the query above with this:
    ROW_NUMBER() OVER (
      PARTITION BY pi.PlayerID
      ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank

Now if there's a tie for the highest pf.StartDate, Oracle looks for the highest pi.MatchIDwithin the subset of rows with the highest pf.StartDate. As it turns out, only one row satisfies this condition:

现在,如果最高 存在并列pf.StartDate,Oracle 将在具有最高pi.MatchID的行子集中pf.StartDate查找最高。事实证明,只有一行满足这个条件:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
                                     -- most recent StartDate (5/1/2013)
      102         1000   05/01/2013
      107         1000   04/18/2013 <-- not considered: has the highest MatchID but isn't
                                     -- in the subset with the most recent StartDate

回答by tylert

You could use the rank() function.

您可以使用 rank() 函数。

SELECT * FROM (
    SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate,
     rank() over (partition by pi.PlayerID order by m.MatchDate desc, rowid) as RNK
    FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
    WHERE pi.MatchID = m.MatchID
    AND m.MatchID = pa.MatchID
    AND pa.TeamID = t.TeamID
    AND pf.PlayerID = pi.PlayerID
    AND pf.TeamID = t.TeamID
    AND muo.MatchID = pi.MatchID
    AND muo.SeasonID = s.SeasonID
    AND pi.MatchID = '&match_id'
    AND m.MatchDate >= pf.StartDate
) WHERE RNK = 1
ORDER BY MatchID ASC, PlayerID ASC, StartDate DESC;

回答by Tse Ka Leong

Maybe using INTERSECTand then find out the MAX(StartDate)using GROUP BY?

也许使用INTERSECT然后找出MAX(StartDate)使用GROUP BY