algorithm - Knuth-Morris-Pratt implementation in pure C -


i have next kmp-implementation:

#include <stdio.h> #include <stdlib.h> #include <string.h>  int kmp(char substr[], char str[]) {    int i, j, n, m;     n = strlen(str);    m = strlen(substr);     int *d = (int*)malloc(m * sizeof(int));    d[0] = 0;     for(i = 0, j = 0; < m; i++)    {       while(j > 0 && substr[j] != substr[i])       {          j = d[j - 1];       }        if(substr[j] == substr[i])       {          j++;          d[i] = j;       }    }     for(i = 0, j = 0; < n; i++)    {       while(j > 0 && substr[j] != str[i])       {          j = d[j - 1];       }        if(substr[j] == str[i])       {          j++;       }        if(j == m)       {          free(d);          return - j + 1;       }    }     free(d);     return -1; }  int main(void) {    char substr[] = "world",       str[] = "hello world!";     int pos = kmp(substr, str);     printf("position starts at: %i\r\n", pos);     return 0; } 

you can test here: http://liveworkspace.org/code/d2e7b3be72083c72ed768720f4716f80

it works on small strings, , have tested large loop, on way fine.

but if change substring i'm searching , complete string these:

char substr[] = "%end%", str[] = "<h1>the result is: <%lua% oleg = { x = 0xa }          table.insert(oleg, y) oleg.y = 5 print(oleg.y) %end%></h1>"; 

only after first try, implementation fails...

please, me repairing implementation of kmp make algorithm work such data in strings...

in 1 place deviate source, source has

while(j>0 && p[j]!=p[i]) j = d[j-1];     if(p[j]==p[i])         j++;         d[i]=j; 

while have

while(j > 0 && substr[j] != substr[i]) {     j = d[j - 1]; } if(substr[j] == substr[i]) {     j++;     d[i] = j; } 

being deceived source's indentation. in source, there no braces around if() branch, increment j++; controlled if; d[i] = j; unconditional.

then, source has error, due unusual use of indices. correct way set array is

int *d = (int*)malloc(m * sizeof(int)); d[0] = 0;  for(i = 1, j = 0; < m; i++) {     while(j > 0 && substr[j-1] != substr[i-1])     {         j = d[j - 1];     }      if(substr[j] == substr[i])         j++;     d[i] = j; } 

but it's confusing, since setup here uses indices i-1 , j-1 i , j determine d[i]. usual way implement different; way implemented in c#. since that's form find in sources, it's far easier convince of correctness of that.


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -