B4A Library Url Detector - detect and extract urls in a long piece of text

tuhatinhvn

Active Member
Licensed User
This code is based on https://github.com/linkedin/URL-Detector
First you need download 2 jar:

https://repo1.maven.org/maven2/com/linkedin/urls/url-detector/0.1.17/url-detector-0.1.17.jar
https://repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.1/commons-lang3-3.1.jar

And copy to your add-library folder
And here code / function to extract list url from any text string

B4X:
#Region  Project Attributes
    #ApplicationLabel: B4A Example
    #VersionCode: 1
    #VersionName:
    'SupportedOrientations possible values: unspecified, landscape or portrait.
    #SupportedOrientations: unspecified
    #CanInstallToExternalStorage: False
#End Region

#Region  Activity Attributes
    #FullScreen: False
    #IncludeTitle: True
#End Region
#AdditionalJar:url-detector-0.1.17.jar
#AdditionalJar:commons-lang3-3.1.jar
Sub Process_Globals
    'These global variables will be declared once when the application starts.
    'These variables can be accessed from all modules.
End Sub
Sub Globals
    'These global variables will be redeclared each time the activity is created.
    'These variables can only be accessed from this module.
End Sub

Sub Activity_Create(FirstTime As Boolean)
    'Do not forget to load the layout file created with the visual designer. For example:
    'Activity.LoadLayout("Layout1")
    Dim str_input="Just because the.com heheyahoo.com weather is starting to get warm, does not mean that you should look sloppy. Get inspired and check out our collection of men's summer outfits.    famousoutfits    " As String
    Private NativeMe As JavaObject
    NativeMe.InitializeContext
    Dim s As List = NativeMe.RunMethod("url_detect", Array As String(str_input))
    Log(s.Size)

For i=0 To s.Size-1
    Log(s.Get(i))
Next
End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub
#IF JAVA
import com.linkedin.urls.Url;
import com.linkedin.urls.detection.UrlDetector;
import com.linkedin.urls.detection.UrlDetectorOptions;
import java.util.List;
import java.util.ArrayList;
public  List<String> url_detect(String stringinput){
        UrlDetector parser = new UrlDetector(stringinput, UrlDetectorOptions.Default);
        List<Url> found = parser.detect();
        List<String> itemsToAdd = new ArrayList<String>();
        for(Url url : found) {
            itemsToAdd.add(url.getFullUrl());
        }
        return itemsToAdd;
    }
#End If
library can find and detect any urls such as:



 
Last edited:

tuhatinhvn

Active Member
Licensed User
Regex will get you most of the way there too, might be useful if you end up porting the app to an environment without a url library. Or if you feel more comfortable crushing nuts with a hammer rather than a hydraulic press (ie sometimes good enough is better ;-)

A super-simple one that works with the above sample text is:

([A-Za-z0-9]+\.)+([A-Za-z0-9]{2,}) <-- you might want to add more characters to the allowed-character lists, eg hyphens
View attachment 83447

which you can perform with code:
B4X:
Dim SampleText As String = "Just because the.com heheyahoo.com weather is starting to get warm, does not mean that you should look sloppy. Get inspired and check out our collection of men's summer outfits.    famousoutfits"
Dim SimplePattern As String = "([A-Za-z0-9]+\.)+([A-Za-z0-9]{2,})"

Log("Searching: " & SampleText)

Dim m As Matcher = Regex.Matcher(SimplePattern, SampleText)

Do While m.Find
    Log("Found domain name: " & m.Group(0))
Loop
which returns log:
B4X:
Searching: Just because the.com heheyahoo.com weather is starting to get warm, does not mean that you should look sloppy. Get inspired and check out our collection of men's summer outfits.    famousoutfits
Found domain name: the.com
Found domain name: heheyahoo.com
Thank for your suggestion, library can find and detect any urls such as, so this exam string is too small to detect

 
Top