Android Code Snippet Full Text Search (FTS) as it Applies to SQLite

LucaMs · Nov 29, 2016

Mahares said:
BENCHMARK:

What about inserting/updating?

Mahares said:
Actually when creating a table with name: fts_table, three to five real (non-virtual) tables referred to as shadow tables are also created

rboeck · Nov 29, 2016

You can use DBUtils sub's like in the 'normal' databases and and all sql statements in a standard way.

In my own apps i prefere to use FTS styled databases, because the google style of search is well known, an specially in mobile apps nobody want to select fields for searching... This together with the enourmos speed
One small negative effect of FTS: you can not use all ODBC drivers for database import, export and integration, but some exists.

Mahares · Nov 29, 2016

LucaMs said:
What about inserting/updating?

Although the main benefits from FTS is the extremely fast search speed, I also conducted a speed test comparison between regular table and FTS table INSERT:
For importing 91000 records of text file 7 MB in size, it took 8.38 seconds for the regular table insert and 18.51 seconds for the FTS insert. The test was conducted on a Samsung Galaxy Tab A tablet. The INSERT is usually just a one time event. I did not compare UPDATE.

keirS · Nov 30, 2016

Mahares said:
NAME: Demonstrating some of the syntax used in creating and using FTS tables.
Actually when creating a table with name: fts_table, three to five real (non-virtual) tables referred to as shadow tables are also created to store the underlying data. Fts_table_content, fts_table_docsize and fts_table_segdir and fts_table_segments. FTS created tables require a lot more memory than regular tables, but their queries execute a lot faster.

Think it's important to point out they are far faster for text searches. They are far slower for any other kind of query. This is because the only field supported by FTS is the TEXT field apart from ROWID. An FTS table doesn't support indexes, foreign keys or triggers so integrity and consistency checking has to be performed manually.

BENCHMARK

Click to expand...

:
I made a comparison between an FTS table and a normal table, where both contain the same 91300 text records, using different SELECT statements on a Samsung Galaxy Tab A tablet. In all cases the query ran a lot faster with FTS. The FTS query took an average of 0.01 second and the ordinary table query ran in about 1 second, in some cases 100 times faster with FTS, but generally a large percentage or few times faster.. For instance, searching for ‘lyterian’ it took 0 second querying the FTS table as opposed to 1.04 seconds querying an ordinary table.
I encourage adding to and improving this thread.

Try comparing a join on a int column between two indexed normal tables and a join between a normal indexed table and a FTS table.

Mahares · Dec 1, 2016

keirS said:
They are far slower for any other kind of query.

It is true that FTS is not the answer to every search scenario. That is why SQLite consider FTS3 and FTS4 extensions. all values inserted into FTS table columns (except the rowid column) are converted to TEXT data type before being saved to enhance the search. FTS uses built-in full-text index. The biggest drawback is the memory consumption because data is stored in 6 tables for every table created.

keirS said:
join between a normal indexed table and a FTS table.

Can you share any speed comparison in one of your tests and contribute an example to this thread? Very little is written in this forum about this subject.

keirS · Dec 5, 2016

Mahares said:
Can you share any speed comparison in one of your tests and contribute an example to this thread? Very little is written in this forum about this subject.

This isn't done on Android because I cant free up the 6GB required for the data easily on any of my devices.

My table definitions

B4X:

CREATE TABLE [AUTHORS] (
  [AUTHORID] INTEGER,
  [FIRST_NAME] CHARACTER(30),
  [LAST_NAME] CHARACTER(40));

CREATE INDEX [AUTHORS_AUTHORID] ON [AUTHORS] ([AUTHORID]);

CREATE INDEX [AUTHORS_FIRSTNAME] ON [AUTHORS] ([LAST_NAME]);

CREATE INDEX [AUTHORS_LASTNAME] ON [AUTHORS] ([LAST_NAME]);

CREATE TABLE BOOKS("TITLE" CHARACTER(122),
"AUTHORID" INTEGER,
"BOOKID" INTEGER,
"PUBLISHERID" INTEGER,
"SUMMARY" CLOB);

CREATE INDEX [BOOK_TITLE] ON [BOOKS] ([TITLE]);

CREATE INDEX [BOOKS_AUTHORID] ON [BOOKS] ([AUTHORID]);

CREATE INDEX [BOOKS_PUBLISHERID] ON [BOOKS] ([PUBLISHERID]);

CREATE INDEX [BOOKS_BOOKID] ON [BOOKS] ([BOOKID]);

CREATE VIRTUAL TABLE BooksFTS USING fts4(BookID int PRIMARY KEY,AuthorID Int, PublisherID Int,Title text default "", Summary text default "");

CREATE TABLE PUBLISHERS("PUBLISHERID" INTEGER,
"NAME" CHARACTER(100));

CREATE INDEX [PUBLISHERS_PUBLISHERID] ON [PUBLISHERS] ([PUBLISHERID]);

CREATE INDEX [PUBLISEHERS_NAME] ON [PUBLISHERS] ([NAME]);


CREATE TABLE [WORDS] (
  [WORD] CHARACTER(60),
  [WORDID] INTEGER,
  [HASH] BIGINT);

CREATE INDEX [WORDS_WORDID] ON [WORDS] ([WORDID]);

CREATE TABLE [WORDSINBOOKS] (
  [WORDID] INTEGER,
  [BOOKID] INTEGER,
  [HASH] BIGINT);

CREATE INDEX [WORDSINBOOKS_BOOKID] ON [WORDSINBOOKS] ([BOOKID]);

CREATE INDEX [WORDSINBOOKS_WORDID] ON [WORDSINBOOKS] ([WORDID]);

CREATE INDEX [WORDSINBOOKS_HASH] ON [WORDSINBOOKS] ([HASH] ASC);

So I have tables containing 86,000 books in both normal ("BOOKS") and FTS ("BooksFTS") forms. These are related to 2 tables Authors and Publishers.

The summary column as been generated randomly and is taken from a list of 365,000 English words contained in the WORDS table. The summary is anywhere between 250 and 1000 words.

WORDSINBOOKS is a lookup table to find words in books. The function below was used generate a hash to populate the hash for each word.

B4X:

Sub GetHash(s As String) As Int
    Dim j As JavaObject = s
    Return j.RunMethod("hashCode",Null)
End Sub

So on to the test results using the BooksFTS table to search for the word "diplomats" in summary column and getting the first and last name of the author.

B4X:

SELECT BooksFTS.Bookid,BooksFTS.Title,BooksFTS.Summary,AUTHORS.LAST_NAME, AUTHORS.FIRST_NAME FROM BooksFTS LEFT OUTER JOIN AUTHORS ON BooksFTS.AuthorID = AUTHORS.AUTHORID WHERE Summary MATCH "diplomats"

Average time over 1000 queries was 38 milliseconds (171 Results)

Doing the same query on the BOOKS table using Summary Like "%diplomats%"

B4X:

SELECT BOOKS.BOOKID,BOOKS.TITLE,BOOKS.SUMMARY,AUTHORS.LAST_NAME,AUTHORS.FIRST_NAME FROM BOOKS LEFT OUTER JOIN AUTHORS ON BOOKS.AUTHORID = AUTHORS.AUTHORID WHERE SUMMARY LIKE "%diplomats%"

Average time over 1000 queries was 3884 milliseconds (171 Results)

Getting the hash for the word diplomats (1768639329) and using the WORDSINBOOKS lookup table.

B4X:

SELECT BOOKS.BOOKID,BOOKS.TITLE,BOOKS.SUMMARY,AUTHORS.LAST_NAME,AUTHORS.FIRST_NAME FROM BOOKS LEFT OUTER JOIN AUTHORS ON BOOKS.AUTHORID = AUTHORS.AUTHORID WHERE BOOKID IN (SELECT BOOKID FROM WORDSINBOOKS WHERE HASH = 1768639329)

Average time over 1000 queries was 42 milliseconds (171 Results)

So using a hash lookup table is only slightly slower than using FTS.

So on to the bit where FTS tables fair very poorly. A query to get all books by the author Stephen King.

Using BooksFTS:

B4X:

SELECT AUTHORS.*, BooksFTS.* FROM  AUTHORS INNER JOIN BooksFTS ON AUTHORS.AUTHORID = BOOKSFTS.AuthorID WHERE AUTHORS.LAST_NAME = "King"  AND  AUTHORS.FIRST_NAME = "Stephen"

Average time over 1000 queries was 3733 milliseconds (313 results)

Using Books:

B4X:

SELECT AUTHORS.*, BOOKS.* FROM  AUTHORS INNER JOIN BOOKS ON  AUTHORS.AUTHORID = BOOKS.AUTHORID where AUTHORS.LAST_NAME = "King"  and  AUTHORS.FIRST_NAME = "Stephen"

Average time over 1000 queries was 79 milliseconds (313 results)

So you have to be very careful how you structure your tables and query's using FTS.

rboeck · Dec 5, 2016

Thanks for your work! I have read somewhere, that sqlite has some problems with joins, but i dont know exactly, if it was a timing problem of if some type of joins are not supported. So i think it would be better for benchmarking, to split between the real search time and the time, which is needed for generating the join; maybe the join is the more time consuming part?

keirS · Dec 5, 2016

SQLte only supports 3 types of join. It doesn't support right joins and full outer joins for example. In this example the join is the time consuming part.

B4X:

SELECT AUTHORS.*, BooksFTS.* FROM AUTHORS INNER JOIN BooksFTS ON AUTHORS.AUTHORID = BOOKSFTS.AuthorID WHERE AUTHORS.LAST_NAME = "King"AND AUTHORS.FIRST_NAME = "Stephen"

This is because BOOKSFTS.AuthorID is not (and cannot be) indexed so SQLite has to do a linear search (iterate through every row) on BooksFTS to match BOOKSFTS.AuthorID to AUTHORS.AUTHORID instead doing a lookup using an index. There is also the overhead of having to convert BOOKSFTS.AuthorID from TEXT to INT but I suspect this a negligible factor in the overall time taken.

Peter Simpson · Aug 12, 2021

This is excellent @Mahares.
It would have been great if you could have attached a small working example above or a larger example via Dropbox. But to tell you the truth your tutorial above is easy enough for us all to follow.

Me, I don't really do much with SQLite. For client work usually for instantly backing up mobile device data before synchronising to an online MySQL database whilst on the move, and for myself just a few simple projects here and there and also for some simple tutorials on the forum.

I must say that I've found this post extremely interesting as I've not come across (or had the need to use) FTS3 or FTS4 before.

Thank you...

Mahares · Aug 14, 2021

Peter Simpson said:
It would have been great if you could have attached a small working example

I am working now on a more recent B4XPages Full Text Search project using the Words,txt file that has been used by Erel and @bsnqt
which I will post, perhaps on a separate thread.

Peter Simpson · Aug 17, 2021

I've just started reading this, it's not too bad really and a lot better than I thought it would be. It's a shame that I don't have more of a reason to use SQLite in more client projects.

SQLite FTS5 Extension

Thank you...

Mahares · Aug 17, 2021

Peter Simpson said:
It would have been great if you could have attached a small working example above or a larger example

See link below to the new example I promised back in post #11 above.

SQLite Full Text Search on a Large Database and Display on an xClv

The primary objective of this cross-platform B4XPages SQLite FTS example is to show the syntax used for Full Text Search queries in SQLite. The database consists of close to 400000 records. It is created from scratch in the project via the imported text file Words.txt. The first time the project...

www.b4x.com

Android Code Snippet Full Text Search (FTS) as it Applies to SQLite

LucaMs

Expert

rboeck

Well-Known Member

Mahares

Expert

keirS

Well-Known Member

Mahares

Expert

keirS

Well-Known Member

rboeck

Well-Known Member

keirS

Well-Known Member

Peter Simpson

Expert

Mahares

Expert

Peter Simpson

Expert

Mahares

Expert

SQLite Full Text Search on a Large Database and Display on an xClv

Similar Threads