Java 使用 JPA 和 Hibernate 时 DISTINCT 如何工作

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1346181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 08:28:23  来源:igfitidea点击:

How does DISTINCT work when using JPA and Hibernate

javajpadistinct

提问by Steve Claridge

What column does DISTINCT work with in JPA and is it possible to change it?

DISTINCT 在 JPA 中使用什么列,是否可以更改它?

Here's an example JPA query using DISTINCT:

下面是一个使用 DISTINCT 的 JPA 查询示例:

select DISTINCT c from Customer c

Which doesn't make a lot of sense - what column is the distinct based on? Is it specified on the Entity as an annotation because I couldn't find one?

这没有多大意义 - 不同的列是基于什么的?它是否在实体上指定为注释,因为我找不到?

I would like to specify the column to make the distinction on, something like:

我想指定要区分的列,例如:

select DISTINCT(c.name) c from Customer c

I'm using MySQL and Hibernate.

我正在使用 MySQL 和 Hibernate。

采纳答案by kazanaki

Update: See the top-voted answer please.

更新:请查看最高投票的答案。

My own is currently obsolete. Only kept here for historical reasons.

我自己的现在已经过时了。只因历史原因留在这里。



Distinct in HQL is usually needed in Joins and not in simple examples like your own.

在连接中通常需要 HQL 中的 Distinct,而不是像您自己的简单示例中。

See also How do you create a Distinct query in HQL

另请参阅如何在 HQL 中创建不同的查询

回答by Tomasz

@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}

回答by Αλ?κο?

You are close.

你很近。

select DISTINCT(c.name) from Customer c

回答by Yan Khonski

I agree with kazanaki's answer, and it helped me. I wanted to select the whole entity, so I used

我同意kazanaki的回答,它帮助了我。我想选择整个实体,所以我使用

 select DISTINCT(c) from Customer c

In my case I have many-to-many relationship, and I want to load entities with collections in one query.

在我的情况下,我有多对多的关系,我想在一个查询中加载带有集合的实体。

I used LEFT JOIN FETCH and at the end I had to make the result distinct.

我使用了 LEFT JOIN FETCH,最后我不得不使结果与众不同。

回答by finrod

I would use JPA's constructor expression feature. See also following answer:

我会使用 JPA 的构造函数表达式功能。另请参阅以下答案:

JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped

JPQL 构造函数表达式 - org.hibernate.hql.ast.QuerySyntaxException:表未映射

Following the example in the question, it would be something like this.

按照问题中的示例,它会是这样的。

SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c

回答by Vlad Mihalcea

As I explained in this article, depending on the underlying JPQL or Criteria API query type, DISTINCThas two meanings in JPA.

正如我在这篇文章中所解释的,根据底层的 JPQL 或 Criteria API 查询类型,DISTINCT在 JPA 中有两个含义。

Scalar queries

标量查询

For scalar queries, which return a scalar projection, like the following query:

对于返回标量投影的标量查询,如以下查询:

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

The DISTINCTkeyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:

DISTINCT关键字应传递给底层的SQL语句,因为我们希望之前,返回结果集数据库引擎过滤重复:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

Entity queries

实体查询

For entity queries, DISTINCThas a different meaning.

对于实体查询,DISTINCT有着不同的含义。

Without using DISTINCT, a query like the following one:

不使用DISTINCT,查询如下所示:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

is going to JOIN the postand the post_commenttables like this:

将像这样JOINpostpost_comment表:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

But the parent postrecords are duplicated in the result set for each associated post_commentrow. For this reason, the Listof Postentities will contain duplicate Postentity references.

但是父post记录在每个关联post_comment行的结果集中重复。出于这个原因,ListPost实体将包含重复的Post实体引用。

To eliminate the Postentity references, we need to use DISTINCT:

为了消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

But then DISTINCTis also passed to the SQL query, and that's not desirable at all:

但是 thenDISTINCT也被传递给 SQL 查询,这根本不是可取的:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

By passing DISTINCTto the SQL query, the EXECUTION PLAN is going to execute an extra Sortphase which adds an overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:

通过传递DISTINCT给 SQL 查询,EXECUTION PLAN 将执行一个额外的Sort阶段,这会增加开销而不会带来任何值,因为由于子 PK 列,父子组合总是返回唯一记录:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

Entity queries with HINT_PASS_DISTINCT_THROUGH

带有 HINT_PASS_DISTINCT_THROUGH 的实体查询

To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGHJPA query hint:

为了从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGHJPA 查询提示:

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

And now, the SQL query will not contain DISTINCTbut Postentity reference duplicates are going to be removed:

现在,SQL 查询将不包含DISTINCTPost实体引用重复项将被删除:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:

执行计划将确认这次我们不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms