java 在 JPA 上选择 DISTINCT
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42830497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select DISTINCT on JPA
提问by Neuromante
I have a table with the ISO 4217 valuesfor currencies (With 6 rows, ID, Country, Currency_Name, Alphabetic_code, Numeric_Code, Minor_Unit).
我有一个表,其中包含货币的ISO 4217 值(有 6 行、ID、国家、Currency_Name、Alphabetic_code、Numeric_Code、Minor_Unit)。
I need to get some of the data for the 4 most used currencies, and my "pure" SQL query goes like this:
我需要获取4 种最常用货币的一些数据,我的“纯”SQL 查询如下所示:
select distinct currency_name, alphabetic_code, numeric_code
from currency
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP')
order by currency_name;
Which returns a 4-row table with the data I need. So far, so good.
它返回一个包含我需要的数据的 4 行表。到现在为止还挺好。
Now I have to translate this to our JPA xml file, and the problems begin. The query I'm trying to get is like this:
现在我必须将其转换为我们的 JPA xml 文件,问题就开始了。我试图得到的查询是这样的:
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
This returns a list with one row for each country that has some of those currencies (As if there were no "DISTINCT" on the query). And I'm scratching my head on why. So the questions would be:
这将返回一个列表,其中每个国家/地区都拥有其中一些货币(好像查询中没有“DISTINCT”一样)。我正在摸索为什么。所以问题是:
1) How can I make this query to return what the "pure" SQL is giving me?
1)我怎样才能让这个查询返回“纯”SQL给我的东西?
2) Why is this query seemingly ignoring my "DISTINCT" clause? There's something I'm missing here, and I don't get what. What's going on, what I'm not getting?
2)为什么这个查询似乎忽略了我的“DISTINCT”子句?我在这里遗漏了一些东西,我不明白。发生了什么,我没有得到什么?
EDIT: Well, this is getting weirder. Somehow, that JPA query works as intended (Returning 4 rows). I've tried this (As I needed some more info):
编辑:嗯,这越来越奇怪了。不知何故,该 JPA 查询按预期工作(返回 4 行)。我试过这个(因为我需要更多信息):
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
And it seems the ID is messing everything, as removing it from the query goes back to return the 4-row table. And putting parenthesis is useless.
并且似乎 ID 弄乱了一切,因为从查询中删除它会返回 4 行表。加括号是没用的。
btw, we are using eclipse link.
顺便说一句,我们正在使用 eclipse 链接。
回答by mhasan
The issue you have is when you are trying to retrieve the columns list (c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)
the
你的问题是,当你试图检索的列清单(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit, c.id)
在
- distinct is run on the entire columns mentioned in the select clause
- distinct 在 select 子句中提到的整个列上运行
and I believe "id" column is unique for every record in your db table and hence you have the possibility of getting duplicates in your other columns (c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit)
.
我相信“id”列对于数据库表中的每条记录都是唯一的,因此您有可能在其他列中获得重复项(c.currencyName, c.alphabeticCode, c.numericCode, c.minorUnit)
。
So here in your case DISTINCT is running on the entire row, not a specific column. If you want to get the unique names, select only that column.
因此,在您的情况下, DISTINCT 是在整行上运行,而不是在特定列上运行。如果要获取唯一名称,请仅选择该列。
IN case you want to run the distinct on more than one column you can do something like this, using the GROUP BY for example to intent to find using c.currencyName, c.alphabeticCode
如果你想在多个列上运行不同的,你可以做这样的事情,例如使用 GROUP BY 来寻找使用 c.currencyName, c.alphabeticCode
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode,c.id
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY') GROUP BY c.currencyName, c.alphabeticCode
ORDER BY c.currencyName
回答by Vlad Mihalcea
Since this is a very common question, I wrote this article, on which this answer is based on.
由于这是一个非常常见的问题,我写了这篇文章,这个答案是基于它的。
To answer your question, the JPQL query you wrote is just fine:
为了回答你的问题,你写的 JPQL 查询就好了:
SELECT DISTINCT c.currencyName, c.alphabeticCode, c.numericCode
FROM Currency c
WHERE c.alphabeticCode IN ('EUR','GBP','USD','JPY')
ORDER BY c.currencyName
And it should translate to the SQL statement you are expecting:
它应该转换为您期望的 SQL 语句:
select distinct currency_name, alphabetic_code, numeric_code
from currency
where ALPHABETIC_CODE IN ('USD','EUR','JPY','GBP')
order by currency_name;
Now, as I explained in this article, depending on the underlying JPQL or Criteria API query type, DISTINCT
has two meanings in JPA.
现在,正如我在本文中所解释的,根据底层的 JPQL 或 Criteria API 查询类型,DISTINCT
在 JPA 中有两个含义。
Scalar queries
标量查询
For scalar queries, which return a scalar projection, like the following query:
对于返回标量投影的标量查询,如以下查询:
List<Integer> publicationYears = entityManager
.createQuery(
"select distinct year(p.createdOn) " +
"from Post p " +
"order by year(p.createdOn)", Integer.class)
.getResultList();
LOGGER.info("Publication years: {}", publicationYears);
The DISTINCT
keyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:
该DISTINCT
关键字应传递给底层的SQL语句,因为我们希望之前,返回结果集数据库引擎过滤重复:
SELECT DISTINCT
extract(YEAR FROM p.created_on) AS col_0_0_
FROM
post p
ORDER BY
extract(YEAR FROM p.created_on)
-- Publication years: [2016, 2018]
Entity queries
实体查询
For entity queries, DISTINCT
has a different meaning.
对于实体查询,DISTINCT
有着不同的含义。
Without using DISTINCT
, a query like the following one:
不使用DISTINCT
,查询如下所示:
List<Post> posts = entityManager
.createQuery(
"select p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
is going to JOIN the post
and the post_comment
tables like this:
将像这样JOINpost
和post_comment
表:
SELECT p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1, 1]
But the parent post
records are duplicated in the result set for each associated post_comment
row. For this reason, the List
of Post
entities will contain duplicate Post
entity references.
但是父post
记录在每个关联post_comment
行的结果集中重复。出于这个原因,List
的Post
实体将包含重复的Post
实体引用。
To eliminate the Post
entity references, we need to use DISTINCT
:
为了消除Post
实体引用,我们需要使用DISTINCT
:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
But then DISTINCT
is also passed to the SQL query, and that's not desirable at all:
但 thenDISTINCT
也被传递给 SQL 查询,这根本不是可取的:
SELECT DISTINCT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
By passing DISTINCT
to the SQL query, the EXECUTION PLAN is going to execute an extra Sortphase which adds an overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:
通过传递DISTINCT
给 SQL 查询,EXECUTION PLAN 将执行一个额外的Sort阶段,这会增加开销而不带来任何值,因为由于子 PK 列,父子组合总是返回唯一记录:
Unique (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
-> Sort (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms
Entity queries with HINT_PASS_DISTINCT_THROUGH
带有 HINT_PASS_DISTINCT_THROUGH 的实体查询
To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGH
JPA query hint:
为了从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH
JPA 查询提示:
List<Post> posts = entityManager
.createQuery(
"select distinct p " +
"from Post p " +
"left join fetch p.comments " +
"where p.title = :title", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
And now, the SQL query will not contain DISTINCT
but Post
entity reference duplicates are going to be removed:
现在,SQL 查询将不包含DISTINCT
但Post
实体引用重复项将被删除:
SELECT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:
执行计划将确认这次我们不再有额外的排序阶段:
Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms