Author : University of California, Berkeley. Computer Science Division
Release : 1991
Genre : Computer algorithms
Kind : eBook
Book Rating : /5 ( reviews)
Book Synopsis Theoretical and Empirical Comparisons of Approximate String Matching Algorithms by : University of California, Berkeley. Computer Science Division
Download or read book Theoretical and Empirical Comparisons of Approximate String Matching Algorithms written by University of California, Berkeley. Computer Science Division. This book was released on 1991. Available in PDF, EPUB and Kindle. Book excerpt: We study in depth a model of non-exact pattern matching based on edit distance, which is the minimum number of substitutions, insertions, adn deletions needed to transform one string of symbols to another. More precisely, the k differences appr oximate string matching problem specifies a text string of length n, a pattern string of length m, the number k of differences (substitutions, insertions, deletions) allowed in a match, and asks for all locations in the text where a match occurs. We have carefully implemented and analyzed various O(kn) algorithms based on dynamic programming (DP), paying particular attention to dependence on b the alphabet size. An empirical observation on the average values of the DP tabulation makes apparent each algori thm's dependence on b. A new algorithm is presented that computes much fewer entires of the DP table. In practice, its speedup over the previous fastest algorithm is 2.5X for binary alphabet; 4X for four-letter alphabet; 10X for twently- letter alphabet. W e give a probabilistic analysis of the DP table in order to prove that the expected running time of our algorithm (as well as an earlier "cut-off" algorithm due to Ukkonen) is O (kn) for random text. Furthermore, we give a heuristic argument that our algo rithm is O (kn/((the square root of b) -1 )) on the average, when alphabet size is taken into consideration.