java 在 JPA 上选择 DISTINCT

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42830497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 06:53:41  来源:igfitidea点击:

Select DISTINCT on JPA

javasqljpajpql

提问by Neuromante

I have a table with the ISO 4217 valuesfor currencies (With 6 rows, ID, Country, Currency_Name, Alphabetic_code, Numeric_Code, Minor_Unit).

我有一个表,其中包含货币的ISO 4217 值(有 6 行、ID、国家、Currency_Name、Alphabetic_code、Numeric_Code、Minor_Unit)。

I need to get some of the data for the 4 most used currencies, and my "pure" SQL query goes like this:

我需要获取4 种最常用货币的一些数据,我的“纯”SQL 查询如下所示:

select distinct currency_name, alphabetic_code, numeric_code 
from currency 
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP') 
order by currency_name;

Which returns a 4-row table with the data I need. So far, so good.

它返回一个包含我需要的数据的 4 行表。到现在为止还挺好。

Now I have to translate this to our JPA xml file, and the problems begin. The query I'm trying to get is like this:

现在我必须将其转换为我们的 JPA xml 文件,问题就开始了。我试图得到的查询是这样的:

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

This returns a list with one row for each country that has some of those currencies (As if there were no "DISTINCT" on the query). And I'm scratching my head on why. So the questions would be:

这将返回一个列表,其中每个国家/地区都拥有其中一些货币(好像查询中没有“DISTINCT”一样)。我正在摸索为什么。所以问题是:

1) How can I make this query to return what the "pure" SQL is giving me?

1)我怎样才能让这个查询返回“纯”SQL给我的东西?

2) Why is this query seemingly ignoring my "DISTINCT" clause? There's something I'm missing here, and I don't get what. What's going on, what I'm not getting?

2)为什么这个查询似乎忽略了我的“DISTINCT”子句?我在这里遗漏了一些东西,我不明白。发生了什么,我没有得到什么?

EDIT: Well, this is getting weirder. Somehow, that JPA query works as intended (Returning 4 rows). I've tried this (As I needed some more info):

编辑:嗯,这越来越奇怪了。不知何故,该 JPA 查询按预期工作(返回 4 行)。我试过这个(因为我需要更多信息):

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

And it seems the ID is messing everything, as removing it from the query goes back to return the 4-row table. And putting parenthesis is useless.

并且似乎 ID 弄乱了一切,因为从查询中删除它会返回 4 行表。加括号是没用的。

btw, we are using eclipse link.

顺便说一句,我们正在使用 eclipse 链接。

回答by mhasan

The issue you have is when you are trying to retrieve the columns list (c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)the

你的问题是,当你试图检索的列清单(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)

  • distinct is run on the entire columns mentioned in the select clause
  • distinct 在 select 子句中提到的整个列上运行

and I believe "id" column is unique for every record in your db table and hence you have the possibility of getting duplicates in your other columns (c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit).

我相信“id”列对于数据库表中的每条记录都是唯一的,因此您有可能在其他列中获得重复项(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit)

So here in your case DISTINCT is running on the entire row, not a specific column. If you want to get the unique names, select only that column.

因此,在您的情况下, DISTINCT 是在整行上运行,而不是在特定列上运行。如果要获取唯一名称,请仅选择该列。

IN case you want to run the distinct on more than one column you can do something like this, using the GROUP BY for example to intent to find using c.currencyName, c.alphabeticCode

如果你想在多个列上运行不同的,你可以做这样的事情,例如使用 GROUP BY 来寻找使用 c.currencyName, c.alphabeticCode

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode,c.id
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') GROUP BY c.currencyName, c.alphabeticCode
ORDER BY c.currencyName

回答by Vlad Mihalcea

Since this is a very common question, I wrote this article, on which this answer is based on.

由于这是一个非常常见的问题,我写了这篇文章,这个答案是基于它的。

To answer your question, the JPQL query you wrote is just fine:

为了回答你的问题,你写的 JPQL 查询就好了:

SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c 
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') 
ORDER BY c.currencyName

And it should translate to the SQL statement you are expecting:

它应该转换为您期望的 SQL 语句:

select distinct currency_name, alphabetic_code, numeric_code 
from currency 
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP') 
order by currency_name;

Now, as I explained in this article, depending on the underlying JPQL or Criteria API query type, DISTINCThas two meanings in JPA.

现在,正如我在本文中所解释的,根据底层的 JPQL 或 Criteria API 查询类型,DISTINCT在 JPA 中有两个含义。

Scalar queries

标量查询

For scalar queries, which return a scalar projection, like the following query:

对于返回标量投影的标量查询,如以下查询:

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

The DISTINCTkeyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:

DISTINCT关键字应传递给底层的SQL语句,因为我们希望之前,返回结果集数据库引擎过滤重复:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

Entity queries

实体查询

For entity queries, DISTINCThas a different meaning.

对于实体查询,DISTINCT有着不同的含义。

Without using DISTINCT, a query like the following one:

不使用DISTINCT,查询如下所示:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

is going to JOIN the postand the post_commenttables like this:

将像这样JOINpostpost_comment表:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

But the parent postrecords are duplicated in the result set for each associated post_commentrow. For this reason, the Listof Postentities will contain duplicate Postentity references.

但是父post记录在每个关联post_comment行的结果集中重复。出于这个原因,ListPost实体将包含重复的Post实体引用。

To eliminate the Postentity references, we need to use DISTINCT:

为了消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

But then DISTINCTis also passed to the SQL query, and that's not desirable at all:

但 thenDISTINCT也被传递给 SQL 查询,这根本不是可取的:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

By passing DISTINCTto the SQL query, the EXECUTION PLAN is going to execute an extra Sortphase which adds an overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:

通过传递DISTINCT给 SQL 查询,EXECUTION PLAN 将执行一个额外的Sort阶段,这会增加开销而不带来任何值,因为由于子 PK 列,父子组合总是返回唯一记录:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

Entity queries with HINT_PASS_DISTINCT_THROUGH

带有 HINT_PASS_DISTINCT_THROUGH 的实体查询

To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGHJPA query hint:

为了从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGHJPA 查询提示:

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

And now, the SQL query will not contain DISTINCTbut Postentity reference duplicates are going to be removed:

现在,SQL 查询将不包含DISTINCTPost实体引用重复项将被删除:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:

执行计划将确认这次我们不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms