hkucuk

Knuth-Morris-Pratt (KMP) Algorithm

April 21, 2023 • ☕️ 3 min read • 🏷 computer, software, search

Translated by author into: English


The Knuth-Morris-Pratt (KMP) algorithm is an efficient string search algorithm used to search for a given pattern in a text. It is a preferred algorithm for quickly searching for patterns, especially in large texts or text collections. The KMP algorithm is especially used in data compression, text editing, database queries, and text processing in general.

The basic principle of the KMP algorithm is to use the relationships between pattern and text by combining all the comparisons that are normally made. In this way, it prevents unnecessary comparisons and provides a faster search. The KMP algorithm can have linear time complexity (O(m + n)) even in poor runtime situations, where “m” is the length of the pattern and “n” is the length of the text.

The working steps of the KMP algorithm are:

  1. Prepare the pattern and text:

    • Pattern: Represents the string to search for.
    • Text: Represents the main string in which we will look for the pattern.
  2. Calculate Prefix Function:

    • The prefix function (or start function) calculates the longest prefix lengths corresponding to each position in the pattern. This helps detect repetitive substrings within the pattern.
  3. Search by Swiping Pattern on Text:

    • Search is made by swiping the pattern to the left on the text.
    • While scrolling on matched characters, progress is made on unmatched characters by using the information from the prefix function. This avoids unnecessary comparisons.

Thanks to these steps, the KMP algorithm can quickly find pattern-text matches and improve performance. The KMP algorithm is faster than the Brute Force method and is often the preferred solution when searching for strings in a large dataset.

Complexity

  • Time: O(|m| + |n|)
  • Space: O(|m|)

Implementation of Knuth-Morris-Pratt Algorithm in GoLang:

package main

import (
	"fmt"
)

func computePrefixFunction(pattern string) []int {
	length := len(pattern)
	prefixFunc := make([]int, length)
	
	prefixLen := 0
	i := 1
	
	for i < length {
		if pattern[i] == pattern[prefixLen] {
			prefixLen++
			prefixFunc[i] = prefixLen
			i++
		} else {
			if prefixLen != 0 {
				prefixLen = prefixFunc[prefixLen-1]
			} else {
				prefixFunc[i] = 0
				i++
			}
		}
	}
	
	return prefixFunc
}

func kmpSearch(text, pattern string) []int {
	prefixFunc := computePrefixFunction(pattern)
	
	textLen := len(text)
	patternLen := len(pattern)
	
	i, j := 0, 0
	indices := []int{}
	
	for i < textLen {
		if pattern[j] == text[i] {
			i++
			j++
		}
		
		if j == patternLen {
			indices = append(indices, i-j)
			j = prefixFunc[j-1]
		} else if i < textLen && pattern[j] != text[i] {
			if j != 0 {
				j = prefixFunc[j-1]
			} else {
				i++
			}
		}
	}
	
	return indices
}

func main() {
	text := "ababcababcabcabc"
	pattern := "abcabc"
	
	indices := kmpSearch(text, pattern)
	
	if len(indices) == 0 {
		fmt.Println("The pattern was not found in the text.")
	} else {
		fmt.Println("Pattern found in text, starting indexes:", indices)
	}
}

This example implements the basic logic of the KMP algorithm in the Go language. The computePrefixFunction function calculates the prefix function for the pattern. The kmpSearch function performs the pattern search of the KMP algorithm. In the main function, the algorithm was run on a sample text and pattern and the results were printed. Using this code, you can try the KMP algorithm in the text and pattern you want.

When the program is run, the output will be as follows.

Pattern found in text, starting indexes: [7 10]

The running version of the program can be accessed from here.


Resources