algorithm - Knuth-Morris-Pratt implementation in pure C -
i have next kmp-implementation:
#include <stdio.h> #include <stdlib.h> #include <string.h> int kmp(char substr[], char str[]) { int i, j, n, m; n = strlen(str); m = strlen(substr); int *d = (int*)malloc(m * sizeof(int)); d[0] = 0; for(i = 0, j = 0; < m; i++) { while(j > 0 && substr[j] != substr[i]) { j = d[j - 1]; } if(substr[j] == substr[i]) { j++; d[i] = j; } } for(i = 0, j = 0; < n; i++) { while(j > 0 && substr[j] != str[i]) { j = d[j - 1]; } if(substr[j] == str[i]) { j++; } if(j == m) { free(d); return - j + 1; } } free(d); return -1; } int main(void) { char substr[] = "world", str[] = "hello world!"; int pos = kmp(substr, str); printf("position starts at: %i\r\n", pos); return 0; }
you can test here: http://liveworkspace.org/code/d2e7b3be72083c72ed768720f4716f80
it works on small strings, , have tested large loop, on way fine.
but if change substring i'm searching , complete string these:
char substr[] = "%end%", str[] = "<h1>the result is: <%lua% oleg = { x = 0xa } table.insert(oleg, y) oleg.y = 5 print(oleg.y) %end%></h1>";
only after first try, implementation fails...
please, me repairing implementation of kmp make algorithm work such data in strings...
in 1 place deviate source, source has
while(j>0 && p[j]!=p[i]) j = d[j-1]; if(p[j]==p[i]) j++; d[i]=j;
while have
while(j > 0 && substr[j] != substr[i]) { j = d[j - 1]; } if(substr[j] == substr[i]) { j++; d[i] = j; }
being deceived source's indentation. in source, there no braces around if()
branch, increment j++;
controlled if
; d[i] = j;
unconditional.
then, source has error, due unusual use of indices. correct way set array is
int *d = (int*)malloc(m * sizeof(int)); d[0] = 0; for(i = 1, j = 0; < m; i++) { while(j > 0 && substr[j-1] != substr[i-1]) { j = d[j - 1]; } if(substr[j] == substr[i]) j++; d[i] = j; }
but it's confusing, since setup here uses indices i-1
, j-1
i
, j
determine d[i]
. usual way implement different; way implemented in c#. since that's form find in sources, it's far easier convince of correctness of that.
Comments
Post a Comment