KMP模式匹配算法

kmp 模式匹配算法的核心在于求 next 数组, 当字符不匹配时,根据 next 数组来决定模式串的移动步数

S => [a b a c c a b a c c c a a]
T => [a b a c c c]
next=> [-1 0 -1 1 0 0]

i = 5, S[i] = 'a', j = 5, T[j] = 'c', S[i] != T[j], 于是下一轮匹配 有i = 5, j = next[j] = 0, 即移动了5位. 当然这是针对模式串非首字符不匹配来移动的,当模式串的首字符不匹配,当然是移动一位, 不需要借助 next

至于怎么求 next 数组,可以参考这篇文章KMP 算法

下面是实现

class KMP(object):
def getNext(self, t=None):
'''
返回一个模式串 t 的 next 数组
'''
if t == None:
return []
nt = [0 for i in range(len(t))] # next 数组
i = 0
j = -1
nt[0] = -1
while i < len(t) - 1:
if j == -1 or t[i] == t[j]:
i += 1
j += 1
if t[i] != t[j]:
nt[i] = j
else:
nt[i] = nt[j]
else:
j = nt[j]
return nt
def idx(self, s, t, pos=0):
'''
返回字串t在主串s中第pos个字符之后的位置, 若不存在,则返回-1
'''
if s == None:
return False
nt = self.getNext(t)
i = pos
j = 0
while i < len(s) and j < len(t):
print 'i =', i, 'j =', j, 's[i] =', s[i], 't[j] =', t[j]
if j == -1 or s[i] == t[j]:
i += 1
j += 1
else:
j = nt[j]
if j == len(t):
return i - len(t)
else:
return -1

当 S = ‘abaccabacccaa’, T = ‘abaccc’, 有如下匹配过程

next => [-1, 0, -1, 1, 0, 0]
i = 0 j = 0 s[i] = a t[j] = a
i = 1 j = 1 s[i] = b t[j] = b
i = 2 j = 2 s[i] = a t[j] = a
i = 3 j = 3 s[i] = c t[j] = c
i = 4 j = 4 s[i] = c t[j] = c
i = 5 j = 5 s[i] = a t[j] = c
i = 5 j = 0 s[i] = a t[j] = a
i = 6 j = 1 s[i] = b t[j] = b
i = 7 j = 2 s[i] = a t[j] = a
i = 8 j = 3 s[i] = c t[j] = c
i = 9 j = 4 s[i] = c t[j] = c
i = 10 j = 5 s[i] = c t[j] = c
idx => 5

坚持原创技术分享