B4J Question [NLP] Can I get example of how to use or train model using NLP library to detect fake news

omo

Active Member
Licensed User
Longtime User
Erel has given several examples of how NLP can be used in text analysis, however, I am yet to see how this can be used in fake news detection. Simple online searches give how it can be done as shown in link below but don't really know how it can be interpreted to b4j.

Can I please get example of how it can be applied using b4j?
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
We need the document categorizer feature => follow this example: https://www.b4x.com/android/forum/threads/nlp-sentiment-analysis.133922/

Code to create the training and test datasets:
B4X:
Sub Process_Globals
End Sub

Sub AppStart (Args() As String)
    Dim TrueLines As List = File.ReadList("C:\Users\H\Downloads\True.csv", "")
    Dim FakeLines As List = File.ReadList("C:\Users\H\Downloads\Fake.csv", "")
    Dim TrainLines As List
    TrainLines.Initialize
    Dim TestLines, TrainLines As List
    TestLines.Initialize
    TrainLines.Initialize
    FillList(TrueLines, TrainLines, TestLines, "True")
    FillList(FakeLines, TrainLines, TestLines, "Fake")
    File.WriteList(File.DirApp, "Train.txt", TrainLines)
    File.WriteList(File.DirApp, "Test.txt", TestLines)
    Log("done")
End Sub

Private Sub FillList(Source As List, Train As List, Test As List, Category As String)
    Dim IsFirst As Boolean = True
    For Each line As String In Source
        If IsFirst Then
            IsFirst = False
            Continue
        End If
        line = line.Trim.Replace("""", "")
        If line.Length = 0 Then
            Continue
        End If
        IIf(Rnd(1, 100) < 66, Train, Test).As(List).Add(Category & " " & line)
    Next
End Sub

Source: https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset

The results over the given dataset (tested over untrained data) are quite good:

---------------------------------------------------------------------
| Tag | Errors | Count | % Err | Precision | Recall | F-Measure |
---------------------------------------------------------------------
| Fake | 67 | 8042 | 0.008 | 0.994 | 0.992 | 0.993 |
| True | 45 | 7428 | 0.006 | 0.991 | 0.994 | 0.992 |
---------------------------------------------------------------------
 
  • Like
Reactions: omo
Upvote 0

omo

Active Member
Licensed User
Longtime User
We need the document categorizer feature => follow this example: https://www.b4x.com/android/forum/threads/nlp-sentiment-analysis.133922/

Code to create the training and test datasets:
B4X:
Sub Process_Globals
End Sub

Sub AppStart (Args() As String)
    Dim TrueLines As List = File.ReadList("C:\Users\H\Downloads\True.csv", "")
    Dim FakeLines As List = File.ReadList("C:\Users\H\Downloads\Fake.csv", "")
    Dim TrainLines As List
    TrainLines.Initialize
    Dim TestLines, TrainLines As List
    TestLines.Initialize
    TrainLines.Initialize
    FillList(TrueLines, TrainLines, TestLines, "True")
    FillList(FakeLines, TrainLines, TestLines, "Fake")
    File.WriteList(File.DirApp, "Train.txt", TrainLines)
    File.WriteList(File.DirApp, "Test.txt", TestLines)
    Log("done")
End Sub

Private Sub FillList(Source As List, Train As List, Test As List, Category As String)
    Dim IsFirst As Boolean = True
    For Each line As String In Source
        If IsFirst Then
            IsFirst = False
            Continue
        End If
        line = line.Trim.Replace("""", "")
        If line.Length = 0 Then
            Continue
        End If
        IIf(Rnd(1, 100) < 66, Train, Test).As(List).Add(Category & " " & line)
    Next
End Sub

Source: https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset

The results over the given dataset (tested over untrained data) are quite good:

---------------------------------------------------------------------
| Tag | Errors | Count | % Err | Precision | Recall | F-Measure |
---------------------------------------------------------------------
| Fake | 67 | 8042 | 0.008 | 0.994 | 0.992 | 0.993 |
| True | 45 | 7428 | 0.006 | 0.991 | 0.994 | 0.992 |
---------------------------------------------------------------------
Thank you so much Erel, you have now made it clearer
 
Upvote 0
Top