vb.net 比较地址的两个字符串时,如何获得百分比准确度匹配?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15398730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 12:47:10  来源:igfitidea点击:

How can I get a percent accuracy match when comparing two strings of an address?

c#vb.netstringmatch

提问by netchicken

I am trying to compare two lists of names and addresses to see find unique data. I can easily extract out all those are are exactly the same string in both lists, then I am left with names and addresses that are different but may be the same people. ie:

我正在尝试比较两个名称和地址列表以查看唯一数据。我可以轻松地提取出两个列表中所有这些完全相同的字符串,然后我会留下不同但可能是同一个人的姓名和地址。IE:

entry in list 1 Smith J Ph234567 34 Smith st

列表中的条目 1 Smith J Ph234567 34 Smith st

entry in list 2 Smith John Ph234567 34 Smith st

列表 2 中的条目 Smith John Ph234567 34 Smith st

or

或者

entry in list 1 Smith J Ph234567 34 Smith Rd

列表中的条目 1 Smith J Ph234567 34 Smith Rd

entry in list 2 Smith J Ph234567 34 Smith Road

清单 2 中的条目 Smith J Ph234567 34 Smith Road

I want to add a tag to entries that seem to be similar with each other like 80% match.

我想为看起来彼此相似的条目添加一个标签,例如 80% 匹配。

Nested Foreach loops don't work as they match every word, or letter (depending how you write it in the string with every other word or letter.

嵌套的 Foreach 循环不起作用,因为它们匹配每个单词或字母(取决于您如何将它与其他每个单词或字母一起写在字符串中。

For loops don't work as one change J vrs John creates errors for every entry after the change.

For 循环不能作为一个更改工作 J vrs John 在更改后为每个条目创建错误。

I am writing it in vb.net but can also translate from C#

我是用 vb.net 写的,但也可以从 C# 翻译

回答by Konrad Rudolph

This kind of problem is generally solved by calculating the edit distancebetween the strings. Start with the Levenshtein distance for instance.

这类问题一般通过计算字符串之间的编辑距离来解决。例如,从 Levenshtein 距离开始。

This will give you a score (the number of “edit?operations” necessary to transform one string into the other). To convert this into a percent identity you need to normalise it by the length of the larger string (something along the lines of percent = (largerString.Length - editDistance) / largerString.Length).

这会给你一个分数(将一个字符串转换为另一个字符串所需的“编辑?操作”的数量)。要将其转换为百分比标识,您需要通过较大字符串的长度(沿着 的线percent = (largerString.Length - editDistance) / largerString.Length)对其进行标准化。