如何在 PostgreSQL 和 JPA 2 中做到不区分大小写和不区分重音?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13026564/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do a like case-insensitive and accent insensitive in PostgreSQL and JPA 2?
提问by user1180339
I have a Java EE project using PostgreSQL 9.X and JPA2 (Hibernate implementation). How can I force a like query to be case insensitive and accent insensitive?
我有一个使用 PostgreSQL 9.X 和 JPA2(Hibernate 实现)的 Java EE 项目。如何强制类似查询不区分大小写和重音不敏感?
I'm able to change the charset of the DB because it's the first project using it.
我能够更改数据库的字符集,因为它是第一个使用它的项目。
回答by Craig Ringer
In general there is no standard way to write "accent-insensitive" code, or to compare words for equality while ignoring accents. The whole idea makes very little sense, as different accented characters mean different things in different languages/dialects, and their "plain ascii" substitutions/expansions vary by language. Please don't do this; resume
and résumé
are different words, and the situation gets even worse when considering any language(s) other than English.
一般来说,没有标准的方法来编写“不区分重音”的代码,或者在忽略重音的情况下比较单词的相等性。整个想法毫无意义,因为不同的重音字符在不同的语言/方言中意味着不同的东西,并且它们的“普通 ascii”替换/扩展因语言而异。请不要这样做;resume
和résumé
是不同的词,当考虑除英语之外的任何语言时,情况会变得更糟。
For case-insensitivity you can use lower(the_col) like lower('%match_expression')
in JPQL. As far as I know ilike
isn't supported in JPQL, but I have not checked the standardto verify this. It's fairly readable, so consider just downloading the JPA2 spec and reading it. JPA2 Criteria offers Restrictions.ilike
for the purpose. Neither will normalize/strip/ignore accented characters.
对于不区分大小写的情况,您可以lower(the_col) like lower('%match_expression')
在 JPQL 中使用。据我所知ilike
JPQL 不支持,但我还没有检查标准来验证这一点。它具有相当的可读性,因此请考虑下载 JPA2 规范并阅读它。JPA2 CriteriaRestrictions.ilike
为此目的而提供。也不会规范化/剥离/忽略重音字符。
For stripping accents, etc, you will probably need to use database-engine specific stored functions or native queries. See, eg this prior answer, or if you intended to substituteaccented characters with an unaccented alternative this PostgreSQL wiki entry- but again, please don't do thisexcept for very limited purposes like finding places where words may've been "unaccented" by misguided software or users.
对于剥离重音等,您可能需要使用数据库引擎特定的存储函数或本机查询。参见,例如这个先前的答案,或者如果您打算用这个 PostgreSQL wiki 条目的非重音替代替代重音字符- 但同样,除了非常有限的目的(例如查找单词可能已“未重音”的地方)之外,请不要这样做被误导的软件或用户。
回答by Clodoaldo Neto
If the unaccent extensionis installed:
如果安装了unaccent 扩展:
select unaccent(lower('?óê'));
unaccent
----------
aoe
回答by motus
I had this issue, and I couldn't use database functions. So instead I used a REGEX restriction in my criteria code:
我遇到了这个问题,我无法使用数据库功能。因此,我在标准代码中使用了 REGEX 限制:
searchText = unaccent(searchText);
String expression = "firstName ~* '.*" + searchText + ".*'";
Criterion searchCriteria = Restrictions.sqlRestriction(expression);
Then I wrote a function called unaccent to change each character to a or-statement, for example any letter e will become (e|é|è). A query for "hello" will become "h(e|é|è)llo".
然后我写了一个叫 unaccent 的函数把每个字符变成一个 or 语句,例如任何字母 e 都会变成 (e|é|è)。对“hello”的查询将变成“h(e|é|è)llo”。
Here is the function inspired from this thread Postgres accent insensitive LIKE search in Rails 3.1 on Heroku
这是受此线程启发的函数Postgres 重音不敏感 LIKE 在 Heroku 上的 Rails 3.1 中搜索
private String unaccent(String text) {
String String charactersProcessed = ""; // To avoid doing a replace multiple times.
String newText = text.toLowerCase();
text = newText; // Case statement is expecting lowercase.
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (charactersProcessed.contains(c + "")) {
continue; // We have already processed this character.
}
String replacement = "";
switch (c) {
case '1': {
replacement = "1";
break;
}
case '2': {
replacement = "2";
break;
}
case '3': {
replacement = "3";
break;
}
case 'a': {
replacement = "á|à|a|?|?|?|ā|?|?|à|á|?|?|?|?|ā|?|?|?";
break;
}
case 'c': {
replacement = "?|?|?|?|?|?|?";
break;
}
case 'd': {
replacement = "?|D";
break;
}
case 'e': {
replacement = "è|é|ê|ё|?|ē|?|?|?|ě|è|ê|?|Ё|ē|?|?|?|ě|";
break;
}
case 'g': {
replacement = "?|?";
break;
}
case 'i': {
replacement = "?|ì|í|?|?|ì|?|ī|?|ì|í|?|?|?|ì|?|ī|?";
break;
}
case 'l': {
replacement = "?|?";
break;
}
case 'n': {
replacement = "ń|ň|?|?|?|?";
break;
}
case 'o': {
replacement = "ò|ó|?|?|?|ō|?|?|?|ò|ó|?|?|?|ō|?|?|?|?";
break;
}
case 'r': {
replacement = "?|?|?";
break;
}
case 's': {
replacement = "?|?|?|?|?|?|?";
break;
}
case 'u': {
replacement = "ù|ú|?|ü|?|ū|?|?|ù|ú|?|ü|?|ū|?|?";
break;
}
case 'y': {
replacement = "y|?|Y|?";
break;
}
case 'z': {
replacement = "?|?|?|?|?|?";
break;
}
}
if (!replacement.isEmpty()) {
charactersProcessed = charactersProcessed + c;
newText = newText.replace(c + "", "(" + c + "|" + replacement + ")");
}
}
return newText;
}