Choose any three.
1.0 Overview
An
The R-Tree concept originated with
2.0 Compiling The R*Tree Module
The source code to the sqlite R*Tree module is included as part of the
3.0 Using the R*Tree Module
The sqlite R*Tree module is implemented as a
The first column of an sqlite R*Tree must always be an integer primary key. The min/max-value pair columns are always stored as 32-bit floating point values. Unlike regular sqlite tables which can store data in a variety of datatypes and formats,the R*Tree indices rigidly enforce these two storage types. Attempts to insert something other than an integer into the first column,or something other than a floating point value into the other columns,will result in an error.
3.1 Creating An R*Tree Index
A new R*Tree index is created as follows:
CREATE VIRTUAL TABLE <name> USING rtree(<column-names>);
The
<name> _node
<name> _rowid
<name> _parent
The shadow tables are ordinary sqlite data tables. You can query them directly if you like,though this unlikely to to reveal anything particularly useful. And you can
As an example,consider creating a two-dimensional R*Tree index for use in spatial queries:
CREATE VIRTUAL TABLE demo_index USING rtree( id,-- Integer primary key minX,maxX,-- Minimum and maximum X coordinate minY,maxY -- Minimum and maximum Y coordinate );
3.2 Populating An R*Tree Index
The usual
INSERT INTO demo_index VALUES( 1,-- Primary key -80.7749,-80.7747,-- Longitude range 30.3776,30.3778 -- Latitude range ); INSERT INTO demo_index VALUES( 2,-81.0,-79.6,35.0,36.2 );
The entries above might represent (for example) a bounding Box around the offices for sqlite.org and bounding Box around the 12th Congressional District of North Carolina in which sqlite.org is located.
3.3 Querying An R*Tree Index
Any valid query will work against an R*Tree index. But the R*Tree implementation is designed to make two kinds of queries especially efficient. First,queries against the primary key are efficient:
SELECT * FROM demo_index WHERE id=1;
Of course,an ordinary sqlite table will do a query against its integer primary key efficiently,so the prevIoUs is no big deal. The real reason for using an R*Tree in the first place is so that you can efficiently do inequality queries against the coordinate ranges. To find all elements of the index that are contained within the vicinity of Charlotte,North Carolina,one might do:
SELECT id FROM demo_index WHERE minX>=-81.08 AND maxX<=-80.58 AND minY>=35.00 AND maxY<=35.44;
The query above would very quickly locate id of 1 even if the R*Tree contained millions of entries. The prevIoUs is an example of a "contained-within" query. The R*Tree also supports "overlapping" queries. For example,to find all bounding Boxes that overlap the Charlotte area:
SELECT id FROM demo_index WHERE maxX>=-81.08 AND minX<=-80.58 AND maxY>=35.00 AND minY<=35.44;
This second query would find both entry 1 (the sqlite.org office) which is entirely contained within the query Box and also Mel Watt's Congressional District which extends well outside the query Box but still overlaps the query Box.
Note that is not necessary for all coordinates in an R*Tree index to be constrained in order for the index search to be efficient. One might,want to query all objects that overlap with the 35th parallel:
SELECT id FROM demo_index WHERE maxY>=35.0 AND minY<=35.0;
But,generally speaking,the more constraints that the R*Tree module has to work with,and the smaller the bounding Box,the faster the results will come back.
4.0 Using R*Trees Effectively
The only information that an R*Tree index stores about an object is its integer ID and its bounding Box. Additional information needs to be stored in separate tables and related to the R*Tree index using the primary key. For the example above,one might create an auxiliary table as follows:
CREATE TABLE demo_data( id INTEGER PRIMARY KEY,-- primary key objname TEXT,-- name of the object objtype TEXT,-- object type boundary BLOB -- detailed boundary of object );
In this example,the demo_data.boundary field is intended to hold some kind of binary representation of the precise boundaries of the object. The R*Tree index only holds an axis-aligned rectangular boundary for the object. The R*Tree boundary is just an approximation of the true object boundary. So what typically happens is that the R*Tree index is used to narrow a search down to a list of candidate objects and then more detailed and expensive computations are done on each candidate to find if the candidate truly meets the search criteria.
Key Point:
An R*Tree index does not normally provide the exact answer but merely reduces the set of potential answers from millions to dozens.
Suppose the demo_data.boundary field holds some proprietary data description of a complex two-dimensional boundary for an object and suppose that the application has used the
SELECT objname FROM demo_data,demo_index WHERE demo_data.id=demo_index.id AND contained_in(demo_data.boundary,:boundary) AND minX>=-81.0 AND max<=-79.6 AND minY>=35.0 AND maxY<=36.2;
In the query above,one would presumably bind the binary BLOB description of the precise boundary of the 12th district to the ":boundary" parameter.
Notice how the query above works: The R*Tree index runs in the outer loop to find entries that are contained within the bounding Box of longitude -81..-79.6 and latitude 35.0..36.2. For each object identifier found,sqlite looks up the corresponding entry in the demo_data table. It then uses the boundary field from the demo_data table as a parameter to the contained_in() function and if that function returns true,the objname field from the demo_data table is returned as the next row of query result.
One would get the same answer without the use of the R*Tree index using the following simpler query:
SELECT objname FROM demo_data WHERE contained_in(demo_data.boundary,:boundary);
The problem with this latter query is that it must apply the contained_in() function to millions of entries in the demo_data table. The use of the R*Tree in the penultimate query reduces the number of calls to contained_in() function to a small subset of the entire table. The R*Tree index did not find the exact answer itself,it merely limited the search space.