string KMP前缀表

Question

提问by Cratylus

I am reading about KMPfor string matching.
It needs a preprocessing of the pattern by building a prefix table.
For example for the string ababacathe prefix table is: P = [0, 0, 1, 2, 3, 0, 1]
But I am not clear on what does the numbers show. I read that it helps to find matches of the pattern when it shifts but I can not connect this info with the numbers in the table.

我正在阅读有关KMP字符串匹配的信息。
它需要通过构建前缀表对模式进行预处理。
例如对于字符串ababaca前缀表是： P = [0, 0, 1, 2, 3, 0, 1]
但我不清楚数字显示什么。我读到它有助于在移动时找到模式的匹配项，但我无法将此信息与表中的数字联系起来。

Answer 1

回答by imslavko

Every number belongs to corresponding prefix ("a", "ab", "aba", ...) and for each prefix it represents length of longest suffix of this string that matches prefix. We do not count whole string as suffix or prefix here, it is called self-suffix and self-prefix (at least in Russian, not sure about English terms).

每个数字都属于相应的前缀（“a”、“ab”、“aba”、...），并且对于每个前缀，它表示该字符串与前缀匹配的最长后缀的长度。这里我们不把整个字符串算作后缀或前缀，它被称为自后缀和自前缀（至少在俄语中，不确定英文术语）。

So we have string "ababaca". Let's look at it. KMP computes Prefix Function for every non-empty prefix. Let's define s[i]as the string, p[i]as the Prefix function. prefix and suffix may overlap.

所以我们有字符串“ababaca”。让我们来看看。KMP 为每个非空前缀计算前缀函数。让我们定义s[i]为字符串，p[i]作为前缀函数。前缀和后缀可能会重叠。

+---+----------+-------+------------------------+
| i |  s[0:i]  | p[i]  | Matching Prefix/Suffix |
+---+----------+-------+------------------------+
| 0 | a        |     0 |                        |
| 1 | ab       |     0 |                        |
| 2 | aba      |     1 | a                      |
| 3 | abab     |     2 | ab                     |
| 4 | ababa    |     3 | aba                    |
| 5 | ababac   |     0 |                        |
| 6 | ababaca  |     1 | a                      |
|   |          |       |                        |
+---+----------+-------+------------------------+

Simple C++ code that computes Prefix function of string S:

计算字符串 S 的 Prefix 函数的简单 C++ 代码：

vector<int> prefixFunction(string s) {
    vector<int> p(s.size());
    int j = 0;
    for (int i = 1; i < (int)s.size(); i++) {
        while (j > 0 && s[j] != s[i])
            j = p[j-1];

        if (s[j] == s[i])
            j++;
        p[i] = j;
    }   
    return p;
}

Answer 2

回答by Yogesh Sanchihar

This code may not be the shortest, but easy to understand flow of code. Simple Java Code for calculating prefix-Array-

这段代码可能不是最短的，但很容易理解的代码流。用于计算前缀数组的简单 Java 代码

    String pattern = "ababaca";
    int i = 1, j = 0;
    int[] prefixArray = new int[pattern.length];
    while (i < pattern.length) {

        while (pattern.charAt(i) != pattern.charAt(j) && j > 0) {
            j = prefixArray[j - 1];

        }
        if (pattern.charAt(i) == pattern.charAt(j)) {
            prefixArray[i] = j + 1;
            i++;
            j++;

        } else {
            prefixArray[i] = j;
            i++;
        }
    }

    for (int k = 0; k < prefixArray.length; ++k) {
        System.out.println(prefixArray[k]);
    }

It produces the required output-

它产生所需的输出 -

0 0 1 2 3 0 1

Answer 3

回答by Mahesh Wakade

Python Implementation

Python 实现

p='ababaca'

l1 = len(p)

j = 0
i = 1
prefix = [0]

while len(prefix) < l1:
    if p[j] == p[i]:
        prefix.append(j+1)
        i += 1
        j += 1
    else:
        if j == 0:
            prefix.append(0)
            i += 1
        if j != 0:
            j = prefix[j-1]

print prefix

Answer 4

回答by here4learn

I have tried my hands using the Javascript, Open for suggestions.

我已经尝试过使用 Javascript，请打开以获取建议。

const prefixArray = function (p) {
let aux = Array(p.length).fill(0);

// For index 0 the matched index will always be 0, so we will we start from 1
let i = 1;
let m = 0; // mismatched index will be from 0th

// run the loop on pattern length
while ( i < p.length) {

    // 3 Cases here
    // First when we have a match of prefix and suffix of pattern
    if(p.charAt(i) === p.charAt(m)) {
        // increment m
        m++;
        // update aux index
        aux[i] = m;
        // update the index.
        i++;
    } 
    // Now if there is no match and m !=0 means some match happened previously
    // then we need to move back M to that index
    else if(p.charAt(i) !== p.charAt(m) && m !== 0) {
        m = aux[m-1];
        // we dont want to increment I as we want to start comparing this suffix with previous matched
    } else {
        // if none of the above conditions then
        // just update the current index in aux array to 0
        aux[i] = 0; // no match
        i++; // shift to the next char
    }
}

return aux; 
}

Answer 5

回答by Pankaj Gupta

    String string = "abababca";
    int[]array = new int[string.length()];

    int i = 1;
    int j = 0;

    while(i<string.length()) {
        // if the character are matching the increment the j and i 
        if(string.charAt(j)==string.charAt(i)) {
            array[i] = array[i-1]+1;
            i++;
            j++;
        }else {

            // if not then move j to array[j-1] position and increment i 
            if(j!=0) {
                j = array[j-1];
            }
            i++;
        }   
    }

    for(int k :array) {
        System.out.print(k+" ");
    }

Answer 6

回答by MOHAMED SABTHAR

string text = "ababbabbababbababbabb"; static int arr[30];

string text = "abbabbabbababbabbabbabbabb"; 静态 int arr[30];

int i = 1;
while (i < text.length())
{
    int j = 0;
    int value = 0;
    while (((i + j) < text.length()) && (text[j] == text[i + j]))
        val[i + j] = ++value, j++;
    i += j + 1;
}

required output stored in val[]

存储在 val[] 中的所需输出

string KMP前缀表

提问by Cratylus

回答by imslavko

回答by Yogesh Sanchihar

回答by Mahesh Wakade

Python Implementation

Python 实现

回答by here4learn

回答by Pankaj Gupta

回答by MOHAMED SABTHAR

相关推荐

最近更新

标签

string KMP前缀表

提问by Cratylus

回答by imslavko

回答by Yogesh Sanchihar

回答by Mahesh Wakade

Python Implementation

Python 实现

回答by here4learn

回答by Pankaj Gupta

回答by MOHAMED SABTHAR

相关推荐

golang 中的 []string 和 ...string 有什么区别？

string Powershell - 将 System.Data.DataRow 解析为字符串

string 如何 fmt.Printf 带有千位逗号的整数

string 修剪字符串的后缀或扩展名？

相关推荐

最近更新

标签