Request: Bulk Library Access for Offline AI Indexing & Tooling

Tim Chapman

Active Member
Licensed User
Longtime User
Hi everyone (and Erel),

I am opening this thread per Erel's suggestion to discuss a resource that could be valuable for the community, specifically for those of us building AI coding agents or offline tooling.

The Project: I am currently developing a local, offline AI coding agent (running on NVIDIA Jetson hardware) specialized for B4X. To make the AI effective, it needs to ingest the API signatures (Public Subs, Properties, Events) of the community libraries so it can write accurate code without hallucinating methods that don't exist.

The Problem: While the libraries_mapping.json (from the B4X_Forum_Resources repo) is an excellent map of what libraries exist, the actual API definitions are locked inside the .b4xlib (Source) and .jar/.xml files scattered across thousands of forum threads.

To build a comprehensive index, I need to parse the actual library files. However, automatically downloading 700+ attachments from the forum is not a viable option, as it would likely trigger CloudFlare protections and put unnecessary load on the server.

The Request: Is there a centralized archive or a "Master ZIP" of the standard community libraries available for download?

Alternatively, could the AnywhereSoftware/B4X_Forum_Resources GitHub repository be updated to include the actual .b4xlib and .xml files (rather than just the forum metadata)?

Having a single source to download the current ecosystem would allow me (and others) to build powerful, context-aware AI tools for B4X completely offline, without risking IP bans or degrading forum performance.

Thank you for considering this!
 

Erel

B4X founder
Staff member
Licensed User
Longtime User

Tim Chapman

Active Member
Licensed User
Longtime User
Hi Erel,

Thank you for the link to the XML Generation Tool.

I understand that this tool can parse a .b4xlib and generate the XML documentation I need. That solves the "Translation" part of the problem for b4xlibs perfectly.

However, it does not solve the "Acquisition" part of the problem, which applies to both library types:

1. The .b4xlib Challenge: To use your tool, I first need to possess the .b4xlib files. Currently, they exist only as attachments scattered across hundreds of forum threads.

2. The Standard Library (.jar + .xml) Challenge: While Java libraries do have XML documentation, they are also scattered across the forum. To index them, I currently have to find and download each one individually.

The Catch-22: To build a complete index of the ecosystem, I would need to write a script to download 700+ attachments (both .b4xlib and .jar/.xml) from the forum. Doing so would trigger CloudFlare and ban my IP, which I strictly want to avoid to respect your server infrastructure.

The Request: Is there a way to obtain a single "Master ZIP" that contains the current collection of community libraries (both .b4xlib and .jar types)?

If I had that archive, I could:

  1. Run the XML Generation tool locally on the .b4xlibs.
  2. Ingest the existing .xml files from the Java libraries.
  3. Build the entire AI database offline, without ever touching the live server or risking a ban.
Thank you for considering this!
 

Tim Chapman

Active Member
Licensed User
Longtime User
I want to share what I have in mind for this. If I can get the data I have requested, I will spend the time and money to get the dataset into the correct form to fine tune a Qwen 2.5 model then give it back to the community. I have a path forward on this. I just need the data. I need all that I can get for the standard and b4xlib libraries as well as the entire forum if possible. Some of that will be too old, but I think 10 years of forum will cover the standard libraries as well as the newer ones. If I am off in my understanding of any of this, please feel free to say so. You certainly won't offend me. I want to get this right the first time.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
The Request: Is there a way to obtain a single "Master ZIP" that contains the current collection of community libraries (both .b4xlib and .jar types)?
No such thing (currently) available.

Note that you don't need to download the files from the forum. The files you need are available on github. You can download the complete repository.
 

Tim Chapman

Active Member
Licensed User
Longtime User
Hi Erel,
Thank you again for the reply. Is there a complete list of libraries available anywhere? You shared a spreadsheet with me a while back, but it seems to be incomplete now. I have the repository from Github and am getting everything out of it that I can. Of the 3400 libraries I have been able to find in that and all of my resources, the attached spreadsheet shows 294 that I don't have any files for. They were in the spreadsheet but I don't dare to try to download that many (using my code) for fear of getting banned by CloudFlare.
 

Attachments

  • B4X-Libraries-Updated.xlsx
    320.1 KB · Views: 63

Tim Chapman

Active Member
Licensed User
Longtime User
I have narrowed it down to 53 libraries that I know of that are not in the github repository.
They are at the top left of the attached spreadsheet and are highlighted in yellow.
Will CloudFlare ban me if I download 53 files?
 

Attachments

  • B4X-Libraries-Updated.xlsx
    326.3 KB · Views: 104

emexes

Expert
Licensed User
Longtime User
Will CloudFlare ban me if I download 53 files?

I have bulk downloaded hundreds of attachments from the forum

(with a voluntary and arbitrarily chosen 5 second gap between downloads ie throttled to maximum of 12 per minute equivalent to 3.6 MB per minute)

without being banned.
 

Tim Chapman

Active Member
Licensed User
Longtime User
I have bulk downloaded hundreds of attachments from the forum

(with a voluntary and arbitrarily chosen 5 second gap between downloads ie throttled to maximum of 12 per minute equivalent to 3.6 MB per minute)

without being banned.

You wouldn't happen to have the 53 libraries that I am missing would you?
 

Tim Chapman

Active Member
Licensed User
Longtime User
I think I have hit upon a better solution than training a model to do this. A database that the AI semantically searches will be able to used with any model and will be able to be automatically updated when the github repository is updated. I have already got this well in hand. The models will improve as time goes on which would require training new ones regularly at great expense. I will post the code for my database system when it is done. I am working on it daily so it will be soon.
 

emexes

Expert
Licensed User
Longtime User
You wouldn't happen to have the 53 libraries that I am missing would you?

Not the first few that I checked. But I was checking for b4xlibs whereas eg CLVBackwards is apparently a class file (CLVBackwards.bas) inside a .zip attachment:

The class is inside the cross platform example project.

I could probably relatively easily find all .bas files inside forum .zip attachment root directories, but not today, more like next Tuesday.

Also involves scanning further back through the forum cf b4xlibs only need to scan back to late 2018.

 
Last edited:

Tim Chapman

Active Member
Licensed User
Longtime User
I don't need you to find files in the github repository. I have done that quite well. The 53 I am missing are not in the github forum in any form. I have searched multiple ways for them. They are at the top of the spreadsheet in yellow. Note that I am already also unzipping and extracting the xml from the b4xlibs as well. All of that is going into the database as well as code snippets, example code, documentation booklets, etc. I am trying to get every morsel of data that is relevant into the database. It is going well. I just can't find the 53 missing libraries without scraping.
 

emexes

Expert
Licensed User
Longtime User

Tim Chapman

Active Member
Licensed User
Longtime User
This project has a new lease on life. Try as I might, I could not get the Qwen 2.5 7b 8-bit model to competently code using the database. I got it up to maybe 50 to 75%, but no where near what I wanted. I have now learned how to train a model on b4x code and need as much good code as I can get my hands on. 50 megs of code would not be too much. So, I am asking for every to pony up and send me code in any of the languages. When I finish training the model, I will make it available to everyone.

I want to add a little more detail of what I am doing. I have a Jetson Orin NX with 16 GB of VRAM. It is the computer I am trying to use to get competent coding. I have turned the github download into a database which I am continually improving on so that I have the best data in the database that I can get for B4X coding.
Training a model required much better hardware than the NX. My brother has a NVIDIA DGX Spark which will do the heavy lifting for the training. My job it is to gather about 50 megabytes of code to feed the model for training. This is a tall order. I doubt in all of my code that I have one meg. The entire forum code in my database is more than that in volume, but a smaller portion of it is rated as good code for the AI to use to train in. So, I am excited to pursue a trained model and make this happen. If anyone want to collaborate on this I welcome doing so. I will be having one of the big AI models like Gemini that is competent in B4X coding make question and answer pairs from the best code samples in my database and any code I can get for this project. Then the NVIDIA DGX Spark will be used for training the model. It is capable of doing so. I will be using the Codestral 7b SSM model as the base model for training because it already knows how to code. It will allow a perfect recall context window of 880K tokens or so using TurboQuant when it finally hits the mainstream llama.cpp release. Then, lastly, I will be using TensorRT Edge to make the model an actual executable that will load in a few seconds and operation at about 20 to 30% faster on the NX. So, wish me luck and send me your code! (AND THANK YOU IN ADVANCE!!!) This will hopefully be a big boost for the community to have a model trained in our favorite programming language available.
 
Last edited:
Cookies are required to use this site. You must accept them to continue using the site. Learn more…