B-Tree
B-tree的主要功能就是索引,它维护着各个页面之间的复杂的关系,便于快速找到所需数据。
B-Tree使得VDBE可以在O(logN)下查询,插入和删除数据,以及O(1)下双向遍历结果集。B-Tree不会直接读写磁盘,它仅仅维护着页面 (pages)之间的关系。当B-TREE需要页面或者修改页面时,它就会调用Pager。当修改页面时,pager保证原始页面首先写入日志文件,当它完成写操作时,pager根据事务状态决定如何做。B-tree不直接读写文件,而是通过page cache这个缓冲模块读写文件对于性能是有重要意义的。
2.5.1 数据库文件格式(DatabaseFile Format)
从逻辑上来说,一个sqlite数据库文件由多个多重Btree构成。一个数据库由许多B-tree构成——每一个表和索引都有一个B-tree(注:索引采用B-tree,而表采用B+tree,这主要是表和索引的需求不同以及B-tree和B+tree的结构不同决定的:B+tree的所有叶子节点包含了全部关键字信息,而且可以有两种顺序查找。而B-tree更适合用来作索引)。
2.5.1.1 页
数据库文件包括一个或多个页。页的大小可以是512B到64KB之间的2的任意次方。
数据库中所有的页面都按从1开始顺序标记,其最大页数目为2147483646 ( - 2)[6]。数据库中的页主要分为以下几类:
- The lock-byte page
- A freelist page
- A freelist trunk page
- A freelist leaf page
- A b-tree page
- A table b-tree interior page
- A table b-tree leaf page
- An index b-tree interior page
- An index b-tree leaf page
- A payload overflow page
- A pointer map page
数据库中加载到内存中的页的定义位于btreeInt.h,其结构如下:
<pre name="code" class="cpp">structMemPage { u8 isInit; /* True if prevIoUsly initialized.MUST BE FIRST! */ u8 nOverflow; /* Number of overflow cell bodies inaCell[] */ u8 intKey; /* True if table b-trees. False for index b-trees */ u8 intKeyLeaf; /* True if the leaf of an intKey table*/ u8 noPayload; /* True if internal intKey page (thusw/o data) */ u8 leaf; /* True if a leaf page */ u8 hdrOffset; /* 100 for page 1. 0 otherwise */ u8 childPtrSize; /* 0 if leaf==1. 4 if leaf==0 */ u8 max1bytePayload; /* min(maxLocal,127) */ u8 bBusy; /* Prevent endless loops on corruptdatabase files */ u16 maxLocal; /* Copy of BtShared.maxLocal orBtShared.maxLeaf */ u16 minLocal; /* Copy of BtShared.minLocal orBtShared.minLeaf */ u16 cellOffset; /* Index in aData of first cell pointer*/ u16 nFree; /* Number of free bytes on the page*/ u16 nCell; /* Number of cells on this page,local and ovfl */ u16 maskPage; /* Mask for page offset */ u16 aiOvfl[5]; /* Insert the i-th overflow cell beforethe aiOvfl-th ** non-overflow cell */ u8 *apOvfl[5]; /* Pointers to the body of overflowcells */ BtShared *pBt; /* Pointer to BtShared that this page ispart of */ u8 *aData; /* Pointer to disk image of the page data */ u8 *aDataEnd; /* One byte past the end of usable data*/ u8 *aCellIdx; /* The cell index area */ u8 *aDataOfst; /* Same as aData for leaves. aData+4 for interior */ DbPage *pDbPage; /* Pager page handle */ u16 (*xCellSize)(MemPage*,u8*); /* cellSizePtr method */ void (*xParseCell)(MemPage*,u8*,CellInfo*);/* btreeParseCell method */ Pgno pgno; /* Page number for this page */ };
2.5.1.2 文件头
数据库中第一个页(page 1)永远是Btree页。而每个表或索引的第1个页称为根页,所有表或索引的根页编号都存储在系统表sqlite_master中,表sqlite_master的根页为page 1。Page 1的前100个字节是一个对数据库文件进行描述的“文件头”。它包括数据库的版本、格式的版本、页大小、编码等所有创建数据库时设置的永久性参数。这个特殊文件头的文档在btreeInt.h中,格式如表2.2所示。
表2.2 Page 1的前100字节
偏移量 |
大小 |
说明 |
|||
0 |
16 |
The header string: "sqlite format 3\000" |
|||
16 |
2 |
The database page size in bytes. Must be a power of two between 512 and 65536. |
|||
18 |
1 |
File format write version. 1 for legacy; 2 forWAL. |
|||
19 |
1 |
File format read version. 1 for legacy; 2 forWAL. |
|||
20 |
1 |
Bytes of unused "reserved" space at the end of each page. Usually 0. |
|||
21 |
1 |
Maximum embedded payload fraction. Must be 64. |
|||
22 |
1 |
Minimum embedded payload fraction. Must be 32. |
|||
23 |
1 |
Leaf payload fraction. Must be 32. |
|||
24 |
4 |
File change counter. |
|||
28 |
4 |
Size of the database file in pages. The "in-header database size". |
|||
32 |
4 |
Page number of the first freelist trunk page. |
|||
36 |
4 |
Total number of freelist pages. |
|||
40 |
4 |
The schema cookie. |
|||
44 |
4 |
The schema format number. Supported schema formats are 1,2,3,and 4. |
|||
48 |
4 |
Default page cache size. |
|||
52 |
4 |
The page number of the largest root b-tree page when in auto-vacuum or incremental-vacuum modes,or zero otherwise. |
|||
56 |
4 |
The database text encoding. A value of 1 means UTF-8. A value of 2 means UTF-16le. A value of 3 means UTF-16be. |
|||
60 |
4 |
The "user version" as read and set by theuser_version pragma. |
|||
64 |
4 |
True (non-zero) for incremental-vacuum mode. False (zero) otherwise. |
|||
68 |
4 |
The "Application ID" set byPRAGMA application_id. |
|||
72 |
20 |
Reserved for expansion. Must be zero. |
|||
92 |
4 |
||||
96 |
4 |
文件头(只有page 1有) |
|||||
页头 |
|||||
单元指针数组 |
|||||
未分配空间 |
|||||
单元内容区 |
偏移量 |
大小 |
说明 |
|||
0 |
1 |
Flags. 1: intkey,2: zerodata,4: leafdata,8: leaf |
|||
1 |
2 |
byte offset to the first freeblock |
|||
3 |
2 |
number of cells on this page |
|||
5 |
2 |
first byte of the cell content area |
|||
7 |
1 |
number of fragmented free bytes |
|||
8 |
4 |
Right child (the Ptr(N) value). Omitted on leaves. |
数据类型 |
Appears in... |
说明 |
|||
Table Leaf (0x0d) |
Table Interior (0x05) |
Index Leaf (0x0a) |
Index Interior (0x02) |
||
4-byte integer |
✔ |
✔ |
Page number of left child |
||
varint |
✔ |
✔ |
✔ |
Number of bytes of payload |
|
varint |
✔ |
✔ |
Rowid |
||
byte array |
✔ |
✔ |
✔ |
Payload |
|
4-byte integer |
✔ |
✔ |
✔ |
Page number of first overflow page |