Oracle SQL 中的高级字符串比较
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9398794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Advanced String Comparisons in Oracle SQL
提问by Phillip
I am wanting to know if there is a way, in Oracle SQL, to compare two strings for similarities other than =
or like
. For instance, SQL Server has a function difference(str1, str2)
that compares the two strings and gives a similarity rating (0 to 4). Not exactly what I want but that would still be extremely helpful.
我想知道在 Oracle SQL 中是否有一种方法可以比较两个字符串的相似性,而不是=
或like
。例如,SQL Server 有一个函数difference(str1, str2)
可以比较两个字符串并给出相似性评级(0 到 4)。不完全是我想要的,但这仍然会非常有帮助。
I was hoping specifically for functions or methodsthat would:
我特别希望有以下功能或方法:
- Compare a string character by character (return the number of exact matches)
- Tell how many characters are in one string, but in the wrong place
- 逐个字符比较字符串(返回完全匹配的数量)
- 告诉一个字符串中有多少个字符,但在错误的地方
The primary use would be for strings of same length, that contain numbers (IDs, phone numbers, etc.) For my purposes, I would use it to find possible matches in which letters/numbers may have been transposed. Soundex(string)
works well for alpha strings but seems to ignores numbers (for good reason).
主要用途是用于包含数字(ID、电话号码等)的相同长度的字符串。出于我的目的,我将使用它来查找可能已调换字母/数字的匹配项。 Soundex(string)
适用于 alpha 字符串,但似乎忽略了数字(有充分的理由)。
I do not have privileges to create functions on my own, but if someone knows a method of doing that I would interested in hearing it. One solution to compare char-to-char (that only works if you know the MAX number of characters) is:
我没有自己创建函数的特权,但是如果有人知道这样做的方法,我会很想听听。比较字符到字符的一种解决方案(仅当您知道最大字符数时才有效)是:
--For char-to-char counts
Decode(substr(ID1,1,1), substr(ID2,1,1), 1, 0) +
Decode(substr(ID1,2,1), substr(ID2,2,1), 1, 0) +
[...]
Decode(substr(ID1,N,1), substr(ID2,N,1), 1, 0)
But that is about as inelegant as you can get.
但这几乎是你所能得到的。
Any help would be greatly appreciated.
任何帮助将不胜感激。
回答by Justin Cave
It sounds like you're looking for the UTL_MATCH package
听起来您正在寻找UTL_MATCH 包
SELECT utl_match.edit_distance( string1, string2 )
FROM dual
tells you the number of edits required to transform string1 into string2
告诉您将 string1 转换为 string2 所需的编辑次数
SQL> select utl_match.edit_distance( 'Bear', 'berry' ) from dual;
UTL_MATCH.EDIT_DISTANCE('BEAR','BERRY')
---------------------------------------
3
There are also a couple of similarity functions EDIT_DISTANCE_SIMILARITY
and JARO_WINKLER_SIMILARITY
that give similarity scores between 0 and 100 that give you an idea of how similar the strings are.
还有一些相似度函数EDIT_DISTANCE_SIMILARITY
,JARO_WINKLER_SIMILARITY
它们给出 0 到 100 之间的相似度分数,让您了解字符串的相似程度。
回答by Aaron Digulla
There are several ways to solve this in Oracle:
在 Oracle 中有几种方法可以解决这个问题:
You can create a stored procedure.
You can upload Java code to an Oracle DB and use the functions defined in the code (intro).
You can fetch the strings from the database and compare them in your application.
您可以创建存储过程。
您可以将 Java 代码上传到 Oracle DB 并使用代码中定义的函数 ( intro)。
您可以从数据库中获取字符串并在您的应用程序中进行比较。
Pros/cons:
优点缺点:
Store procedures are somewhat hard to write and maintain and they might be slow. But they are a standard, often used tool, so unless your company has a strict "no-go" policy, they are usually the "solution" (often like regular expression which solve one problem and create two new ones).
存储过程有点难以编写和维护,而且速度可能很慢。但它们是一种标准的、经常使用的工具,因此除非您的公司有严格的“禁止”政策,否则它们通常是“解决方案”(通常就像解决一个问题并创建两个新问题的正则表达式)。
Java code is an extremely powerful tool but I have seen many Oracle installations and so far no one used Java code. I'm not sure what the reasons are, many people just seem to be wary but nothing solid ever materialized. Also note that DB servers are optimized for IO and less so for CPU. So complex Java code might ruin your performance in more ways than you might expect (following the lines of "many are wary")
Java 代码是一种非常强大的工具,但我见过很多 Oracle 安装,但到目前为止还没有人使用 Java 代码。我不确定原因是什么,许多人似乎只是保持警惕,但没有任何可靠的东西成为现实。另请注意,数据库服务器针对 IO 进行了优化,而针对 CPU 优化较少。如此复杂的 Java 代码可能会以超出您预期的更多方式破坏您的性能(遵循“许多人都很谨慎”的说法)
The last solution always works but depending on what you need, it might just not be an option. On the other hand, I've seen code that performed muchbetter by downloading a lot of the data and performing the complex processing in the app. In one example, the query would take 15 seconds and downloading + command line grep(1)
took 0.3s.
最后一个解决方案总是有效,但根据您的需要,它可能不是一种选择。另一方面,我看到通过下载大量数据并在应用程序中执行复杂处理,代码的性能要好得多。在一个示例中,查询需要 15 秒,而下载 + 命令行grep(1)
需要 0.3秒。