Java 使用 JPA 和 Hibernate 时 DISTINCT 如何工作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1346181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does DISTINCT work when using JPA and Hibernate
提问by Steve Claridge
What column does DISTINCT work with in JPA and is it possible to change it?
DISTINCT 在 JPA 中使用什么列,是否可以更改它?
Here's an example JPA query using DISTINCT:
下面是一个使用 DISTINCT 的 JPA 查询示例:
select DISTINCT c from Customer c
Which doesn't make a lot of sense - what column is the distinct based on? Is it specified on the Entity as an annotation because I couldn't find one?
这没有多大意义 - 不同的列是基于什么的?它是否在实体上指定为注释,因为我找不到?
I would like to specify the column to make the distinction on, something like:
我想指定要区分的列,例如:
select DISTINCT(c.name) c from Customer c
I'm using MySQL and Hibernate.
我正在使用 MySQL 和 Hibernate。
采纳答案by kazanaki
Update: See the top-voted answer please.
更新:请查看最高投票的答案。
My own is currently obsolete. Only kept here for historical reasons.
我自己的现在已经过时了。只因历史原因留在这里。
Distinct in HQL is usually needed in Joins and not in simple examples like your own.
在连接中通常需要 HQL 中的 Distinct,而不是像您自己的简单示例中。
See also How do you create a Distinct query in HQL
另请参阅如何在 HQL 中创建不同的查询
回答by Tomasz
@Entity
@NamedQuery(name = "Customer.listUniqueNames",
query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
...
private String name;
public static List<String> listUniqueNames() {
return = getEntityManager().createNamedQuery(
"Customer.listUniqueNames", String.class)
.getResultList();
}
}
回答by Αλ?κο?
You are close.
你很近。
select DISTINCT(c.name) from Customer c
回答by Yan Khonski
I agree with kazanaki's answer, and it helped me. I wanted to select the whole entity, so I used
我同意kazanaki的回答,它帮助了我。我想选择整个实体,所以我使用
select DISTINCT(c) from Customer c
In my case I have many-to-many relationship, and I want to load entities with collections in one query.
在我的情况下,我有多对多的关系,我想在一个查询中加载带有集合的实体。
I used LEFT JOIN FETCH and at the end I had to make the result distinct.
我使用了 LEFT JOIN FETCH,最后我不得不使结果与众不同。
回答by finrod
I would use JPA's constructor expression feature. See also following answer:
我会使用 JPA 的构造函数表达式功能。另请参阅以下答案:
JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped
JPQL 构造函数表达式 - org.hibernate.hql.ast.QuerySyntaxException:表未映射
Following the example in the question, it would be something like this.
按照问题中的示例,它会是这样的。
SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c
回答by Vlad Mihalcea
As I explained in this article, depending on the underlying JPQL or Criteria API query type, DISTINCT
has two meanings in JPA.
正如我在这篇文章中所解释的,根据底层的 JPQL 或 Criteria API 查询类型,DISTINCT
在 JPA 中有两个含义。
Scalar queries
标量查询
For scalar queries, which return a scalar projection, like the following query:
对于返回标量投影的标量查询,如以下查询:
List<Integer> publicationYears = entityManager
.createQuery(
"select distinct year(p.createdOn) " +
"from Post p " +
"order by year(p.createdOn)", Integer.class)
.getResultList();
LOGGER.info("Publication years: {}", publicationYears);
The DISTINCT
keyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:
该DISTINCT
关键字应传递给底层的SQL语句,因为我们希望之前,返回结果集数据库引擎过滤重复:
SELECT DISTINCT
extract(YEAR FROM p.created_on) AS col_0_0_
FROM
post p
ORDER BY
extract(YEAR FROM p.created_on)
-- Publication years: [2016, 2018]
Entity queries
实体查询
For entity queries, DISTINCT
has a different meaning.
对于实体查询,DISTINCT
有着不同的含义。
Without using DISTINCT
, a query like the following one:
不使用DISTINCT
,查询如下所示:
List<Post> posts = entityManager
.createQuery(
"select p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
is going to JOIN the post
and the post_comment
tables like this:
将像这样JOINpost
和post_comment
表:
SELECT p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1, 1]
But the parent post
records are duplicated in the result set for each associated post_comment
row. For this reason, the List
of Post
entities will contain duplicate Post
entity references.
但是父post
记录在每个关联post_comment
行的结果集中重复。出于这个原因,List
的Post
实体将包含重复的Post
实体引用。
To eliminate the Post
entity references, we need to use DISTINCT
:
为了消除Post
实体引用,我们需要使用DISTINCT
:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
But then DISTINCT
is also passed to the SQL query, and that's not desirable at all:
但是 thenDISTINCT
也被传递给 SQL 查询,这根本不是可取的:
SELECT DISTINCT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
By passing DISTINCT
to the SQL query, the EXECUTION PLAN is going to execute an extra Sortphase which adds an overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:
通过传递DISTINCT
给 SQL 查询,EXECUTION PLAN 将执行一个额外的Sort阶段,这会增加开销而不会带来任何值,因为由于子 PK 列,父子组合总是返回唯一记录:
Unique (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
-> Sort (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms
Entity queries with HINT_PASS_DISTINCT_THROUGH
带有 HINT_PASS_DISTINCT_THROUGH 的实体查询
To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGH
JPA query hint:
为了从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH
JPA 查询提示:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
And now, the SQL query will not contain DISTINCT
but Post
entity reference duplicates are going to be removed:
现在,SQL 查询将不包含DISTINCT
但Post
实体引用重复项将被删除:
SELECT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:
执行计划将确认这次我们不再有额外的排序阶段:
Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms