1.Trie树简介
Trie树,又称字典树、前缀树,被用于信息检索(information retrieval)的数据结构。Trie一词便来自于单词retrieval。基本思想:用字符串的公共前缀降低查询时间。比如,在最优的查询二叉树中查询关键字的时间复杂度为M * log N,M是字符串最大长度,N为字符串数量;而用Trie树时,只需O(M)时间。
[1] 中给出一个简单Trie树例子,蓝色表示一个单词结尾;该Trie树存储的单词为the,their,there,a,any,answer,bye。[1]中称蓝色的节点为leaf node,个人觉得不太恰当:树的leaf node不能再有分支,而Trie树中蓝色节点还是有分支的。
root / \ \ t a b | | | h n y | | \ | e s y e / | | i r w | | | r e e | r
Trie树的表示
每一个Trie树节点有很多分支(branch),每一个分支指向字母表中26个字母中的一个。为了表示一个单词已经结束,我们还需要在Trie树节点增加value变量用以标记单词结尾。Trie树节点:
struct trie_node
{
int value; /* Used to mark leaf nodes */
trie_node_t *children[ALPHABET_SIZE];
};
关键字插入和搜索
从root节点开始,从上至下,依次按关键字的每个字符进行插入。搜索与插入相类似,判断关键字存在于Trie树中:当且仅当关键字搜索到最后一个字,当前的Trie树节点的value变量表示单词结尾;其余情况均表示关键字不在Trie树中。
Trie树的C实现 [1]:
#include <stdio.h> #include <stdlib.h> #include <string.h> #define ARRAY_SIZE(a) sizeof(a)/sizeof(a[0]) // Alphabet size (# of symbols) #define ALPHABET_SIZE (26) // Converts key current character into index // use only 'a' through 'z' and lower case #define CHAR_TO_INDEX(c) ((int)c - (int)'a') // trie node typedef struct trie_node trie_node_t; struct trie_node { int value; trie_node_t *children[ALPHABET_SIZE]; }; // trie ADT typedef struct trie trie_t; struct trie { trie_node_t *root; int count; }; // Returns new trie node (initialized to NULLs) trie_node_t *getNode(void) { trie_node_t *pNode = NULL; pNode = (trie_node_t *)malloc(sizeof(trie_node_t)); if( pNode ) { int i; pNode->value = 0; for(i = 0; i < ALPHABET_SIZE; i++) { pNode->children[i] = NULL; } } return pNode; } // Initializes trie (root is dummy node) void initialize(trie_t *pTrie) { pTrie->root = getNode(); pTrie->count = 0; } // If not present,inserts key into trie // If the key is prefix of trie node,just marks leaf node void insert(trie_t *pTrie,char key[]) { int level; int length = strlen(key); int index; trie_node_t *pCrawl; pTrie->count++; pCrawl = pTrie->root; for( level = 0; level < length; level++ ) { index = CHAR_TO_INDEX(key[level]); if( !pCrawl->children[index] ) { pCrawl->children[index] = getNode(); } pCrawl = pCrawl->children[index]; } // mark last node as leaf pCrawl->value = pTrie->count; } // Returns non zero,if key presents in trie int search(trie_t *pTrie,char key[]) { int level; int length = strlen(key); int index; trie_node_t *pCrawl; pCrawl = pTrie->root; for( level = 0; level < length; level++ ) { index = CHAR_TO_INDEX(key[level]); if( !pCrawl->children[index] ) { return 0; } pCrawl = pCrawl->children[index]; } return (0 != pCrawl && pCrawl->value); } // Driver int main() { int i; // Input keys (use only 'a' through 'z' and lower case) char keys[][8] = {"the","a","there","answer","any","by","bye","their"}; trie_t trie; char output[][32] = {"Not present in trie","Present in trie"}; initialize(&trie); // Construct trie for(i = 0; i < ARRAY_SIZE(keys); i++) { insert(&trie,keys[i]); } // Search for different keys printf("%s --- %s\n","the",output[search(&trie,"the")] ); printf("%s --- %s\n","these","these")] ); printf("%s --- %s\n","their","their")] ); printf("%s --- %s\n","thaw","thaw")] ); return 0; }
2. 应用
(1) Linux命令的自动补全。Linux给所有的系统的命令建立Trie树,比如,当用户输入psi时,系统搜索Trie树,发现psi下只链接psidtopgm命令,系统自动补全成psidtopgm。
(2) 判断一个字符串是否为另一个字符串的前缀,比如,HDU 1305。解法:对于待插入的字符,判断其字符(非末尾)是否为结尾标志,即value变量不为零。在insert( )函数中,加上判断即可。
3.Referrence