string 如何找到包含给定字符串中所有字符的最小子字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2459653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:41:15  来源:igfitidea点击:

How to find smallest substring which contains all characters from a given string?

stringalgorithmsubstring

提问by Rajendra Uppal

I have recently come across an interesting question on strings. Suppose you are given following:

我最近遇到了一个关于字符串的有趣问题。假设您得到以下信息:

Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"

So, given above, how can I approach towards finding smallest substring of string1 that contains all the characters from string 2?

那么,上面给出,我怎样才能找到包含字符串 2 中所有字符的 string1 的最小子字符串?

采纳答案by Rex Kerr

You can do a histogram sweep in O(N+M)time and O(1)space where Nis the number of characters in the first string and Mis the number of characters in the second.

您可以在O(N+M)时间和O(1)空间中进行直方图扫描,其中N是第一个字符串中M的字符数,是第二个字符串中的字符数。

It works like this:

它是这样工作的:

  • Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
  • Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
  • Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
  • Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
  • Repeat until you've passed through the entire first string.
  • The marked substring is your answer.
  • 制作第二个字符串字符的直方图(关键操作是hist2[ s2[i] ]++)。
  • 制作第一个字符串的字符的累积直方图,直到该直方图包含第二个字符串的直方图包含的每个字符(我将其称为“直方图条件”)。
  • 然后在第一个字符串上向前移动,从直方图中减去,直到它不满足直方图条件。将第一个字符串的那一点(在最后一步之前)标记为您的暂定子字符串。
  • 再次向前移动子串的前面,直到再次满足直方图条件。将末端向前移动,直到它再次失败。如果这是比第一个更短的子字符串,请将其标记为您的暂定子字符串。
  • 重复直到您通过了整个第一个字符串。
  • 标记的子字符串是您的答案。

Note that by varying the check you use on the histogram condition, you can choose either to have the same set of charactersas the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0and a[i]>=b[i].)

请注意,通过改变对直方图条件使用的检查,您可以选择与第二个字符串具有相同的字符集,或者至少每种类型的字符数相同。(它只是之间的差异a[i]>0 && b[i]>0a[i]>=b[i]。)

You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)

如果您在尝试满足条件时跟踪哪个条件不满足,则可以加快直方图检查,并仅检查在尝试打破它时减少的内容。(在最初的构建中,您计算​​满足了多少项,并在每次添加将条件从 false 变为 true 的新字符时增加该计数。)

回答by 1337c0d3r

To see more details including working code, check my blog post at:

要查看包括工作代码在内的更多详细信息,请查看我的博客文章:

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

To help illustrate this approach, I use an example: string1 = "acbbaca"and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).

为了帮助说明这种方法,我使用了一个示例: string1 ="acbbaca"和 string2 = "aba"。在这里,我们还使用术语“窗口”,它表示来自 string1 的连续字符块(可以与术语 substring 互换)。

alt text

替代文字

i) string1 = "acbbaca" and string2 = "aba".

i) string1 = "acbbaca" 和 string2 = "aba"。

alt text

替代文字

ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint.

ii) 找到第一个最小窗口。请注意,我们不能像 hasFound['a'] == needToFind['a'] == 2 那样推进开始指针。推进意味着打破约束。

alt text

替代文字

iii) The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right.

iii) 找到第二个窗口。开始指针仍然指向第一个元素'a'。hasFound['a'] (3) 大于 needToFind['a'] (2)。我们将 hasFound['a'] 减一并将开始指针向右推进。

alt text

替代文字

iv) We skip 'c' since it is not found in string2. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right.

iv) 我们跳过 'c',因为它在 string2 中找不到。开始指针现在指向'b'。hasFound['b'] (2) 大于 needToFind['b'] (1)。我们将 hasFound['b'] 减一并将开始指针向右推进。

alt text

替代文字

v) Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window.

v) 开始指针现在指向下一个“b”。hasFound['b'] (1) 等于 needToFind['b'] (1)。我们立即停止,这是我们新发现的最小窗口。

The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.

思路主要是在遍历string1时借助两个指针(窗口的开始和结束位置)和两个表(needToFind和hasFound)的帮助。needToFind 将字符的总数存储在 string2 中,hasFound 存储到目前为止遇到的字符的总数。我们还使用计数变量来存储 string2 中到目前为止遇到的总字符数(不计算 hasFound[x] 超过 needToFind[x] 的字符数)。当 count 等于 string2 的长度时,我们知道找到了一个有效的窗口。

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

每次我们推进结束指针(指向元素 x)时,我们都会将 hasFound[x] 增加 1。如果 hasFound[x] 小于或等于needToFind[x],我们还将计数加一。为什么?当满足约束(即计数等于 string2 的大小)时,我们立即将开始指针尽可能向右移动,同时保持约束。

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

我们如何检查它是否保持约束?假设开始指向元素 x,我们检查 hasFound[x] 是否大于 needToFind[x]。如果是,我们可以将 hasFound[x] 减 1 并在不破坏约束的情况下推进开始指针。另一方面,如果不是,我们会立即停止,因为前进的开始指针打破了窗口约束。

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.

最后,我们检查最小窗口长度是否小于当前最小值。如果找到新的最小值,则更新当前最小值。

Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

本质上,该算法找到满足约束的第一个窗口,然后自始至终继续保持约束。

回答by user287792

Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.

这是一个 O(n) 解决方案。基本思想很简单:对于每个起始索引,找到最小的结束索引,使得子串包含所有必要的字母。诀窍是最小结尾索引会随着函数的运行而增加,因此在少量数据结构支持的情况下,我们最多考虑每个字符两次。

In Python:

在 Python 中:

from collections import defaultdict

def smallest(s1, s2):
    assert s2 != ''
    d = defaultdict(int)
    nneg = [0]  # number of negative entries in d
    def incr(c):
        d[c] += 1
        if d[c] == 0:
            nneg[0] -= 1
    def decr(c):
        if d[c] == 0:
            nneg[0] += 1
        d[c] -= 1
    for c in s2:
        decr(c)
    minlen = len(s1) + 1
    j = 0
    for i in xrange(len(s1)):
        while nneg[0] > 0:
            if j >= len(s1):
                return minlen
            incr(s1[j])
            j += 1
        minlen = min(minlen, j - i)
        decr(s1[i])
    return minlen

回答by Hymandaniel

I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.

我收到了同样的面试问题。我是一名 C++ 候选人,但我能够在 JAVA 中相对快速地编写代码。

Java[Courtesy : Sumod Mathilakath]

Java[提供:Sumod Mathilakath]

import java.io.*;
import  java.util.*;

class UserMainCode
{


    public String GetSubString(String input1,String input2){
        // Write code here...
        return find(input1, input2);
    }
  private static boolean containsPatternChar(int[] sCount, int[] pCount) {
        for(int i=0;i<256;i++) {
            if(pCount[i]>sCount[i])
                return false;
        }
        return true;
    }
  public static String find(String s, String p) {
        if (p.length() > s.length())
            return null;
        int[] pCount = new int[256];
        int[] sCount = new int[256];
        // Time: O(p.lenght)
        for(int i=0;i<p.length();i++) {
            pCount[(int)(p.charAt(i))]++;
            sCount[(int)(s.charAt(i))]++;
        }
        int i = 0, j = p.length(), min = Integer.MAX_VALUE;
        String res = null;
        // Time: O(s.lenght)
        while (j < s.length()) {
            if (containsPatternChar(sCount, pCount)) {
                if ((j - i) < min) {
                    min = j - i;
                    res = s.substring(i, j);
                    // This is the smallest possible substring.
                    if(min==p.length())
                        break;
                    // Reduce the window size.
                    sCount[(int)(s.charAt(i))]--;
                    i++;
                }
            } else {
                sCount[(int)(s.charAt(j))]++;
                // Increase the window size.
                j++;
            }
        }
        System.out.println(res);
        return res;
    }
}

C++[Courtesy : sundeepblue]

C++[礼貌: sundeepblue]

#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
    if(s.empty() || t.empty()) return;

    int ns = s.size(), nt = t.size();
    vector<int> total(256, 0);
    vector<int> sofar(256, 0);
    for(int i=0; i<nt; i++) 
        total[t[i]]++;

    int L = 0, R; 
    int minL = 0;                           //gist2
    int count = 0;
    int min_win_len = INT_MAX;

    for(R=0; R<ns; R++) {                   // gist0, a big for loop
        if(total[s[R]] == 0) continue;
        else sofar[s[R]]++;

        if(sofar[s[R]] <= total[s[R]])      // gist1, <= not <
            count++;

        if(count == nt) {                   // POS1
            while(true) {
                char c = s[L]; 
                if(total[c] == 0) { L++; }
                else if(sofar[c] > total[c]) {
                    sofar[c]--;
                    L++;
                }
                else break;
            }  
            if(R - L + 1 < min_win_len) {   // this judge should be inside POS1
                min_win_len = R - L + 1;
                minL = L;
            }
        }
    }
    string res;
    if(count == nt)                         // gist3, cannot forget this. 
        res = s.substr(minL, min_win_len);  // gist4, start from "minL" not "L"
    return res;
}
int main() {
    string s = "abdccdedca";
    cout << find_minimum_window(s, "acd");
}

Erlang[Courtesy : wardbekker]

Erlang[礼貌:wardbekker]

-module(leetcode).

-export([min_window/0]).

%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".

%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.



min_window() ->
    "eca" = min_window("cabeca", "cae"),
    "eca" = min_window("cfabeca", "cae"),
    "aec" = min_window("cabefgecdaecf", "cae"),
    "cwae" = min_window("cabwefgewcwaefcf", "cae"),
    "BANC" = min_window("ADOBECODEBANC", "ABC"),
    ok.

min_window(T, S) ->
    min_window(T, S, []).

min_window([], _T, MinWindow) ->
    MinWindow;
min_window([H | Rest], T, MinWindow) ->
    NewMinWindow = case lists:member(H, T) of
                       true ->
                           MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
                           case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
                               andalso length(MinWindowFound) > 0) of
                               true ->
                                   MinWindowFound;
                               false ->
                                   MinWindow
                           end;
                       false ->
                           MinWindow
                   end,
    min_window(Rest, T, NewMinWindow).

fullfill_window(_, [], Acc) ->
    %% window completed
    Acc;
fullfill_window([], _T, _Acc) ->
    %% no window found
    "";
fullfill_window([H | Rest], T, Acc) ->
    %% completing window
    case lists:member(H, T) of
        true ->
            fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
        false ->
            fullfill_window(Rest, T, Acc ++ [H])
    end.

REF:

参考:

回答by Manish Kumar

Please have a look at this as well:

请也看看这个:

//-----------------------------------------------------------------------

bool IsInSet(char ch, char* cSet)
{
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != '
public static String shortestSubstrContainingAllChars(String input, String target) {
    int needToFind[] = new int[256];
    int hasFound[] = new int[256];
    int totalCharCount = 0;
    String result = null;

    char[] targetCharArray = target.toCharArray();
    for (int i = 0; i < targetCharArray.length; i++) {
        needToFind[targetCharArray[i]]++;           
    }

    char[] inputCharArray = input.toCharArray();
    for (int begin = 0, end = 0; end < inputCharArray.length; end++) {

        if (needToFind[inputCharArray[end]] == 0) {
            continue;
        }

        hasFound[inputCharArray[end]]++;
        if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
            totalCharCount ++;
        }
        if (totalCharCount == target.length()) {
            while (needToFind[inputCharArray[begin]] == 0 
                    || hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {

                if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
                    hasFound[inputCharArray[begin]]--;
                }
                begin++;
            }

            String substring = input.substring(begin, end + 1);
            if (result == null || result.length() > substring.length()) {
                result = substring;
            }
        }
    }
    return result;
}
') { if(ch == *(cSet+ index)) { return true; } ++index; } return false; } void removeChar(char ch, char* cSet) { bool bShift = false; int index = 0; while (*(cSet + index) != '
@Test
public void shortestSubstringContainingAllCharsTest() {
    String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
    assertThat(result, equalTo("baca"));

    result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
    assertThat(result, equalTo("BANC"));

    result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
    assertThat(result, equalTo("t stri"));
}
') { if( (ch == *(cSet + index)) || bShift) { *(cSet + index) = *(cSet + index + 1); bShift = true; } ++index; } } typedef struct subStr { short iStart; short iEnd; short szStr; }ss; char* subStringSmallest(char* testStr, char* cSet) { char* subString = NULL; int iSzSet = strlen(cSet) + 1; int iSzString = strlen(testStr)+ 1; char* cSetBackUp = new char[iSzSet]; memcpy((void*)cSetBackUp, (void*)cSet, iSzSet); int iStartIndx = -1; int iEndIndx = -1; int iIndexStartNext = -1; std::vector<ss> subStrVec; int index = 0; while( *(testStr+index) != '
public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
    Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
    int[] patternHist = new int[256];
    for (int i = 0; i < pattern.Length; i++)
    {
        patternHist[pattern[i]]++;
    }
    int[] inputHist = new int[256];
    int minWindowLength = int.MaxValue;
    int count = 0;
    for (int begin = 0, end = 0; end < input.Length; end++)
    {
        // Skip what's not in pattern.
        if (patternHist[input[end]] == 0)
        {
            continue;
        }
        inputHist[input[end]]++;
        // Count letters that are in pattern.
        if (inputHist[input[end]] <= patternHist[input[end]])
        {
            count++;
        }
        // Window found.
        if (count == pattern.Length)
        {
            // Remove extra instances of letters from pattern
            // or just letters that aren't part of the pattern
            // from the beginning.
            while (patternHist[input[begin]] == 0 ||
                   inputHist[input[begin]] > patternHist[input[begin]])
            {
                if (inputHist[input[begin]] > patternHist[input[begin]])
                {
                    inputHist[input[begin]]--;
                }
                begin++;
            }
            // Current window found.
            int windowLength = end - begin + 1;
            if (windowLength < minWindowLength)
            {
                windowCoords = new Tuple<int, int>(begin, end);
                minWindowLength = windowLength;
            }
        }
    }
    if (count == pattern.Length)
    {
        return windowCoords;
    }
    return null;
}
' ) { if (IsInSet(*(testStr+index), cSetBackUp)) { removeChar(*(testStr+index), cSetBackUp); if(iStartIndx < 0) { iStartIndx = index; } else if( iIndexStartNext < 0) iIndexStartNext = index; else ; if (strlen(cSetBackUp) == 0 ) { iEndIndx = index; if( iIndexStartNext == -1) break; else { index = iIndexStartNext; ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)}; subStrVec.push_back(stemp); iStartIndx = iEndIndx = iIndexStartNext = -1; memcpy((void*)cSetBackUp, (void*)cSet, iSzSet); continue; } } } else { if (IsInSet(*(testStr+index), cSet)) { if(iIndexStartNext < 0) iIndexStartNext = index; } } ++index; } int indexSmallest = 0; for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec) { if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr) indexSmallest = indexVec; } subString = new char[(subStrVec[indexSmallest].szStr) + 1]; memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr); memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1); delete[] cSetBackUp; return subString; } //--------------------------------------------------------------------

回答by craftsmannadeem

Here is Java implementation

这是Java实现

def get(s, alphabet="abc"):
    seen = {}
    for c in alphabet:
        seen[c] = 0
    seen[s[0]] = 1
    start = 0
    end = 0
    shortest_s = 0
    shortest_e = 99999
    while end + 1 < len(s):
        while seen[s[start]] > 1:
            seen[s[start]] -= 1
            start += 1
        # Constant time check:
        if sum(seen.values()) == len(alphabet) and all(v == 1 for v in seen.values()) and \
                shortest_e - shortest_s > end - start:
            shortest_s = start
            shortest_e = end
        end += 1
        seen[s[end]] += 1
    return s[shortest_s: shortest_e + 1]


print(get("abbcac")) # Expected to return "bca"

Here is the Junit Test

这是 Junit 测试

    String s = "xyyzyzyx";
    String s1 = "xyz";
    String finalString ="";
    Map<Character,Integer> hm = new HashMap<>();
    if(s1!=null && s!=null && s.length()>s1.length()){
        for(int i =0;i<s1.length();i++){
            if(hm.get(s1.charAt(i))!=null){
                int k = hm.get(s1.charAt(i))+1;
                hm.put(s1.charAt(i), k);
            }else
                hm.put(s1.charAt(i), 1);
        }
        Map<Character,Integer> t = new HashMap<>();
        int start =-1;
         for(int j=0;j<s.length();j++){
             if(hm.get(s.charAt(j))!=null){
                 if(t.get(s.charAt(j))!=null){
                     if(t.get(s.charAt(j))!=hm.get(s.charAt(j))){
                     int k = t.get(s.charAt(j))+1;
                        t.put(s.charAt(j), k);
                     }
                 }else{
                     t.put(s.charAt(j), 1);
                     if(start==-1){
                         if(j+s1.length()>s.length()){
                             break;
                         }
                         start = j;
                     }
                 }
                 if(hm.equals(t)){
                    t = new HashMap<>();
                    if(finalString.length()<s.substring(start,j+1).length());
                    {
                        finalString=s.substring(start,j+1);
                    }
                    j=start;
                    start=-1;                       
                 }
             }
         }

回答by shlatchz

C# Implementation:

C# 实现:

function shortestSubStringOfUniqueChars(s){
 var uniqueArr = [];
 for(let i=0; i<s.length; i++){
  if(uniqueArr.indexOf(s.charAt(i)) <0){
   uniqueArr.push(s.charAt(i));
  }
 }

 let windoww = uniqueArr.length;

 while(windoww < s.length){
  for(let i=0; i<s.length - windoww; i++){
   let match = true;
   let tempArr = [];
   for(let j=0; j<uniqueArr.length; j++){
    if(uniqueArr.indexOf(s.charAt(i+j))<0){
     match = false;
     break;
    }
   }
  let checkStr
  if(match){
   checkStr =  s.substr(i, windoww);
   for(let j=0; j<uniqueArr.length; j++){
    if(uniqueArr.indexOf(checkStr.charAt(j))<0){
     match = false;
     break;
    }
   }
  }
  if(match){
      return checkStr;
  }
   }
   windoww = windoww + 1;
 }
}

console.log(shortestSubStringOfUniqueChars("ABA"));

回答by TheLogicGuy

I've implemented it using Python3 at O(N) efficiency:

我已经使用 Python3 以 O(N) 的效率实现了它:

##代码##

回答by Sai Chand

##代码##

回答by ganesh phirke

JavaScript solution in bruteforce way:

蛮力方式的 JavaScript 解决方案:

##代码##