让我们说搜索字符串可以包含文本“schw”,“schwa”.和“施瓦茨”.我有三个关键字都解决了文本“schwarz”.
现在,我正在寻找一种有效的方式来找到所有的关键字,而不需要对每一个关键字做一个string.Contains(关键字).
样品数据:
H-Fuss ahorn 15 cm/SH48cm Metall-Fuss chrom 9 cm/SH42cm Metall-Kufe alufbg.12 cm/SH45c Metall-Kufe verchr.12 cm/SH45c Metall-Zylind.aluf.12cm/SH45cm Kufe alufarbig Metall-Zylinder hoch alufarbig Kunststoffgl.schw. - hoch Kunststoffgl.schw. - Standard Kunststoffgleiter - schwarz für Sitzhoehe 42 cm
示例关键字(键值):
h-fuss,Holz ahorn,Ahorn Metall,Metall chrom,Chrom verchr,Chrom alum,Aluminium aluf,Aluminium kufe,Kufe zylind,Zylinder hoch,Hoch kunststoffgl,Gleiter gleiter,Gleiter schwarz,Schwarz schw.,Schwarz
样品结果:
Holz,Ahorn Metall,Chrom Metall,Kufe,Aluminium Metall,Zylinder,Aluminium Kufe,Hoch,Aluminium Gleiter,Schwarz,Hoch Gleiter,Schwarz Gleiter,Schwarz
解决方法
The 07001
algorithm is a string searching
algorithm invented by Alfred V. Aho
and Margaret J. Corasick. It is a kind
of dictionary-matching algorithm that
locates elements of a finite set of
strings (the “dictionary”) within an
input text. It matches all patterns
“at once”,so the complexity of the
algorithm is linear in the length of
the patterns plus the length of the
searched text plus the number of
output matches. Note that because all
matches are found,there can be a
quadratic number of matches if every
substring matches (e.g. dictionary =
a,aa,aaa,aaaa and input string is
aaaa).The 07002 is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find any one of a set of pattern strings in a text. For text of length n and p patterns of combined length m,its average and best case running time is O(n+m) in space O(p),but its worst-case time is O(nm). In contrast,the Aho–Corasick string matching algorithm has asymptotic worst-time complexity O(n+m) in space O(m).