提取Java中两个字符串之间的差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18344721/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 00:44:34  来源:igfitidea点击:

Extract the difference between two strings in Java

javastringcompare

提问by N Deepak Prasath

Hi I have two strings :

嗨,我有两个字符串:

    String hear = "Hi My name is Deepak"
            + "\n"
            + "How are you ?"
            + "\n"
            + "\n"
            + "How is everyone";
    String dear = "Hi My name is Deepak"
            + "\n"
            + "How are you ?"
            + "\n"
            + "Hey there \n"
            + "How is everyone";

I want to get what is not present in the hear string that is "Hey There \n". I found a method , but it fails for this case :

我想获得“嘿那里\n”的听字符串中不存在的内容。我找到了一种方法,但在这种情况下它失败了:

static String strDiffChop(String s1, String s2) {
    if (s1.length() > s2.length()) {
        return s1.substring(s2.length() - 1);
    } else if (s2.length() > s1.length()) {
        return s2.substring(s1.length() - 1);
    } else {
        return "";
    }
}

Can any one help ?

任何人都可以帮忙吗?

回答by Fly

One can use the StringUtilsfrom Apache Commons. Here is the StringUtils API.

可以使用StringUtils来自Apache Commons 的。这是StringUtils API

public static String difference(String str1, String str2) {
    if (str1 == null) {
        return str2;
    }
    if (str2 == null) {
        return str1;
    }
    int at = indexOfDifference(str1, str2);
    if (at == -1) {
        return EMPTY;
    }
 return str2.substring(at);
}
public static int indexOfDifference(String str1, String str2) {
    if (str1 == str2) {
        return -1;
    }
    if (str1 == null || str2 == null) {
        return 0;
    }
    int i;
    for (i = 0; i < str1.length() && i < str2.length(); ++i) {
        if (str1.charAt(i) != str2.charAt(i)) {
            break;
        }
    }
    if (i < str2.length() || i < str1.length()) {
        return i;
    }
    return -1;
}

回答by gurbieta

You should use StringUtils from Apache Commons

你应该使用 Apache Commons 中的 StringUtils

String diff = StringUtils.difference( "Word", "World" );
System.out.println( "Difference: " + diff );


Difference: ld

Source: https://www.oreilly.com/library/view/jakarta-commons-cookbook/059600706X/ch02s15.html

资料来源:https: //www.oreilly.com/library/view/jakarta-commons-cookbook/059600706X/ch02s15.html

回答by Aditya Rai

convert the string to lists and then use the following method to get result How to remove common values from two array list

将字符串转换为列表,然后使用以下方法获取结果如何从两个数组列表中删除公共值

回答by Mike Samuel

google-diff-match-patch

谷歌差异匹配补丁

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

Diff:

Compare two blocks of plain text and efficiently return a list of differences.

Match:

Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.

Patch:

Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.

Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

Diff Match 和 Patch 库提供了强大的算法来执行同步纯文本所需的操作。

差异:

比较两个纯文本块并有效地返回差异列表。

比赛:

给定一个搜索字符串,在纯文本块中找到它的最佳模糊匹配。对准确性和位置进行加权。

修补:

将补丁列表应用于纯文本。即使底层文本不匹配,也要尽最大努力应用补丁。

目前可用于 Java、JavaScript、Dart、C++、C#、Objective C、Lua 和 Python。无论语言如何,每个库都具有相同的 API 和相同的功能。所有版本还具有全面的测试工具。

There is a Line or word diffswiki page which describes how to do line-by-line diffs.

有一个Line 或 word diffswiki 页面,它描述了如何进行逐行差异。

回答by N Deepak Prasath

what about this snippet ?

这个片段怎么样?

public static void strDiff(String hear, String dear){
    String[] hr = dear.split("\n");
    for (String h : hr) {
        if (!hear.contains(h)) {
            System.err.println(h);
        }
    }
}

回答by VJ THAKUR

I have used the StringTokenizerto find the solution. Below is the code snippet

我已经使用StringTokenizer来找到解决方案。下面是代码片段

public static List<String> findNotMatching(String sourceStr, String anotherStr){
    StringTokenizer at = new StringTokenizer(sourceStr, " ");
    StringTokenizer bt = null;
    int i = 0, token_count = 0;
    String token = null;
    boolean flag = false;
    List<String> missingWords = new ArrayList<String>();
    while (at.hasMoreTokens()) {
        token = at.nextToken();
        bt = new StringTokenizer(anotherStr, " ");
        token_count = bt.countTokens();
        while (i < token_count) {
            String s = bt.nextToken();
            if (token.equals(s)) {
                flag = true;
                break;
            } else {
                flag = false;
            }
            i++;
        }
        i = 0;
        if (flag == false)
            missingWords.add(token);
    }
    return missingWords;
}

回答by ahanook

I was looking for some solution but couldn't find the one i needed, so I created a utility class for comparing two version of text - new and old - and getting result text with changes between tags - [added] and [deleted]. It could be easily replaced with highlighter you choose instead of this tags, for example: a html tag. string-version-comparison

我正在寻找一些解决方案,但找不到我需要的解决方案,所以我创建了一个实用程序类,用于比较两个版本的文本 - 新的和旧的 - 并通过标签之间的更改获取结果文本 - [添加] 和 [删除]。它可以轻松地替换为您选择的荧光笔而不是此标签,例如:html 标签。字符串版本比较

Any comments will be appreciated.

任何意见将不胜感激。

*it might not worked well with long text because of higher probability of finding same phrases as deleted.

*它可能不适用于长文本,因为找到与删除相同的短语的可能性更高。

回答by jjoller

If you prefer not to use an external library, you can use the following Java snippet to efficiently compute the difference:

如果您不想使用外部库,您可以使用以下 Java 代码段来有效地计算差异:

/**
 * Returns an array of size 2. The entries contain a minimal set of characters
 * that have to be removed from the corresponding input strings in order to
 * make the strings equal.
 */
public String[] difference(String a, String b) {
    return diffHelper(a, b, new HashMap<>());
}

private String[] diffHelper(String a, String b, Map<Long, String[]> lookup) {
    return lookup.computeIfAbsent(((long) a.length()) << 32 | b.length(), k -> {
        if (a.isEmpty() || b.isEmpty()) {
            return new String[]{a, b};
        } else if (a.charAt(0) == b.charAt(0)) {
            return diffHelper(a.substring(1), b.substring(1), lookup);
        } else {
            String[] aa = diffHelper(a.substring(1), b, lookup);
            String[] bb = diffHelper(a, b.substring(1), lookup);
            if (aa[0].length() + aa[1].length() < bb[0].length() + bb[1].length()) {
                return new String[]{a.charAt(0) + aa[0], aa[1]};
            } else {
                return new String[]{bb[0], b.charAt(0) + bb[1]};
            }
        }
    });
}

This approach is using dynamic programming. It tries all combinations in a brute force way but remembers already computed substrings and therefore runs in O(n^2).

这种方法是使用动态规划。它以蛮力的方式尝试所有组合,但记住已经计算的子串,因此在 O(n^2) 中运行。

Examples:

例子:

String hear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "\n"
        + "How is everyone";
String dear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "Hey there \n"
        + "How is everyone";
difference(hear, dear); // returns {"","Hey there "}

difference("Honda", "Hyundai"); // returns {"o","yui"}

difference("Toyota", "Coyote"); // returns {"Ta","Ce"}