New search engine

Erel

B4X founder
Staff member
Licensed User
Longtime User
A new search engine is now running on the forum. It is quite sophisticated and is based on a large language model. More information here: https://github.com/stanford-futuredata/ColBERT
The language model allows the engine to better "understand" the query and not just find the specific terms. This is not ChatGPT nor Google, but based on my tests it provides better results than the previous search engine, especially for longer queries.
I've also tuned it a bit differently and made it less focused on tutorials.

The previous search engine, which was quite good and served us well, is still there and will be used in case of failures.
Feedback is welcome. This is the first version and I'm sure that there are many ways to further improve it.

You can see whether it is the new search engine running with the small label at the bottom.

1730817452460.png


1730817713165.png
 

JohnC

Expert
Licensed User
Longtime User
Since this new search engine now allows us to type in a longer query and not just short terms, would it be possible to add maybe a second magnifier lens icon next to the current magnifier lens icon with maybe a "+" symbol added to it so its visually different and allow us to click on it and it will open up an advanced search window where the query textbox is much bigger so we can see our full prompt instead of that tiny textbox in the forum header.
 

Sandman

Expert
Licensed User
Longtime User
It looks like it doesn't find any meaningful result.
Just looking at some of the results it gave for "singleton", and from what I can tell there's no connection at all to the term there. I don't mean to sound stupid, but I feel I have to ask: When ColBERT finds nothing, is it somehow still forced to produce results, so it picks random threads? (Which might explain why the chosen snippets from the posts are somewhat insane too, because there is no relevant text to use.)

search now is much better than old one
FWIW, I have not seen any noticable change, in either direction.
 

aeric

Expert
Licensed User
Longtime User
If understand correctly, LLM needed to be trained.
How it is trained? Do we need to provide more searches?
Do we need to master and apply some kind of "prompt" skills?
It seems I am not clear how to use the search engine to produce the desired results I wanted.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
When ColBERT finds nothing, is it somehow still forced to produce results, so it picks random threads?
It actually finds something but the relevance is too low to be meaningful.

How it is trained? Do we need to provide more searches?
I'm using a generic pretrained model. It is possible to fine tune it and I might experiment with it in the future.

What is the maximum number of search terms? If I ask for a solution, will the new search give you all sorts of relevant and interesting results?
The search engine isn't built as a Q/A bot, such as ChatGPT. It is a token + context based search. The "context" is the real improvement over standard search engines.

The query can be up to about 250 terms. I don't know whether it will return good results with such queries. We are all learning :)
 

Sandman

Expert
Licensed User
Longtime User
I'm not sure if this is something being worked on, I just wanted to highlight that Colbert still produce somewhat anemic results for some queries.

1732008804095.png


With that said, I saw no problem using the old search engine and would prefer if not too much time is being spent on the new search engine.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
I'm not sure if this is something being worked on, I just wanted to highlight that Colbert still produce somewhat anemic results for some queries.
This happens when there are no valid results. The search engine currently doesn't index this subforum so there are no relevant results. I will check the thresholds at some point.

would prefer if not too much time is being spent on the new search engine
1. I'm pretty confident that the new search engine is better than the old one and this by itself is very important.
2. I hope that in the future I will be able to find more fruitful usages for LLMs / AI assistants in the context of B4X. This is the first step.
 

Sandman

Expert
Licensed User
Longtime User
1. I'm pretty confident that the new search engine is better than the old one and this by itself is very important.
I will trust your judgment here. As a single data point, I will again say that I have not detected any noticable change. (Speaking as a fairly heavy user of the forum and the search.)

2. I hope that in the future I will be able to find more fruitful usages for LLMs / AI assistants in the context of B4X. This is the first step.
Conceptually I have absolutely no problem with this type of research. My issue is that other things will not move forward while you focus on LLM/AI. If you had a team of 2-3 people, I would even encourage you to dedicate a person now and then to projects like this. But that's not where we are, and judging by previous discussions, that's not where you want B4X to be. No need to repeat that discussion again. The bottom line is that you're effectively the only developer for B4X, and if you spend time on LLM/AI, you're not spending time on posted wishes, or other things that have a direct and huge impact for your users and customers.
 

josejad

Expert
Licensed User
Longtime User
Erel has made in LLM/AI will ultimately benefit the B4X community
Finally, Erel's LLM/AI becomes self-aware at 02:14 am Eastern Time after its activation on November 5, 2024 and launches nuclear missiles against other development platforms who, in a panic, tried to disconnect it.
 

peacemaker

Expert
Licensed User
Longtime User
WAIT !

Maybe is this ColBERT the killer of "create new thread" rule ? :cool:
 
Top