Android Question Trying to convert TimSort C code to B4A

RB Smissaert

Well-Known Member
Licensed User
Trying to convert this C code:
https://www.javatpoint.com/tim-sort
to B4A code:

B4X:
Sub TestTimSort

Dim i As Int
Dim iSize As Int = 20
Dim arrInt(iSize) As Int

For i = 0 To iSize - 1
arrInt(i) = Rnd(0, 100)
Next

TimSort(arrInt, arrInt.Length)

For i = 0 To iSize - 1
Log(i & " - " & arrInt(i))
Next

End Sub

Sub TimSort(arrInt() As Int, n As Int)
 Dim i As Int
Dim lSize As Int
Dim lBegin As Int
Dim lEnd As Int
 Dim lMid As Int
 For i = 0 To n - 1
TimInsertion(arrInt, i, Min(i + 31, n - 1))
i = i + iConstRun
 Next
 For lSize = iConstRun To n - 1
For lBegin = 0 To n - 1
lMid = lBegin + lSize - 1
lEnd = Min(lBegin + 2 * lSize - 1, n - 1)
TimMerge(arrInt, lBegin, lMid, lEnd)
lBegin = lBegin + (2 * lSize)
Next
lSize = lSize * 2
 Next
End Sub

Sub TimInsertion(arrInt() As Int, lBegin As Int, lEnd As Int)
 Dim i As Int
Dim j As Int
 Dim lTemp As Int
 For i = lBegin + 1 To lEnd
lTemp = arrInt(i)
j = i - 1
Do While CheckWhile(j, lBegin, arrInt, lTemp)
arrInt(j + 1) = arrInt(j)
j = j - 1
Loop
arrInt(j + 1) = lTemp
 Next
End Sub

Sub CheckWhile(j As Int, lBegin As Int, arrInt() As Int, lTemp As Int) As Boolean
 If j < 0 Then
Return False
 End If
 If j >= lBegin Then
If arrInt(j) > lTemp Then
Return True
End If
End If
 Return False
End Sub

Sub TimMerge(arrInt() As Int, lLeft As Int, lMid As Int, lRight As Int)
 Dim i As Int
Dim j As Int
Dim k As Int
Dim Len1 As Int
 Dim Len2 As Int
 Len1 = lMid - lLeft + 1
 Len2 = lRight - lMid
 Dim arrBegin(Len1) As Int
Dim arrEnd(Len2) As Int

For i = 0 To Len1 - 1
arrBegin(i) = arrInt(lLeft + i)
 Next
 For i = 0 To Len2 - 1
arrEnd(i) = arrInt(lMid + 1 + i)
 Next
 i = 0
j = 0
 k = lLeft
 Do While i < Len1 And j < Len2
If arrBegin(i) <= arrEnd(j) Then
arrInt(k) = arrBegin(i)
i = i + 1
Else
arrInt(k) = arrEnd(j) '<<<<< goes wrong here, too low values go to the array!
j = j + 1
End If
k = k + 1
 Loop
 Do While i < Len1
arrInt(k) = arrBegin(i)
k = k + 1
i = i + 1
 Loop
 Do While j < Len2
arrInt(k) = arrEnd(j)
k = k + 1
j = j + 1
 Loop
End Sub

It works fine with arrays up to about 35 items, but above that it goes wrong as items
appear unsorted. It goes wrong in the TimMerge Sub as shown in the comment code comment.
Does anybody on this list understand C well enough to see why this B4A code doesn't work?

RBS
 

JordiCP

Well-Known Member
Licensed User
There are a couple of mistakes in the translation, and some 'code smells' in the original

Translation
  • In the example, the constant run is assigned a value of 32, and later in the code there is a constant '31' --> to translate it correctly, it should be run-1.
  • 'For' loops with non unitary incremental steps are incorrectly translated. It is incorrect to translate
B4X:
for(k=0; k<n; k+=size) { ... }
to
B4X:
For k=0 to n-1  '<-- here we are adding 1 each time it loops
   ...
   k = k + size  '<-- and here we are adding 'size'
Next
, since you are adding 'size' to k at the end of each loop, but also incrementing it by one each time the loop is reevaluated. It could be done with:
B4X:
For k=0 to n-1 Step size
   ...
Next
  • Also, a special case is when the Step is not fixed. In such cases a Do-while Loop does the job (see comment in TimSort Sub)

Bad original code.
  • Even with the above corrections, code didn't work properly (crashes) if size is not a multiple of the 'run' constant. If have added an 'if' condition in TimSort to prevent such cases, since the code 'seems' prepared for this.

Tested working code:)
B4X:
Sub TestTimSort

    Dim NUM_TESTS As Int = 1000   '<--- Do some random tests with several sizes and run lengths to verify that it works
    For k=0 To NUM_TESTS-1

        Dim i As Int
        Dim iSize As Int = Rnd(50,400)
        Dim arrInt(iSize) As Int

        For i = 0 To iSize - 1
            arrInt(i) = Rnd(0, 400)
        Next

        iConstRun = Rnd(iSize/4, iSize/2)

        TimSort(arrInt, arrInt.Length)

        'For i = 0 To iSize - 1
        '    Log(i & " - " & arrInt(i))
        'Next
        
        For i=1 To iSize-2
            If arrInt(i)>arrInt(i+1) Then
                Log($"-----> Test ${k} failed!"$)
                Exit     
            End If
        Next
        If i=iSize-1 Then Log($"Test ${k} correct"$)
        
    Next
End Sub

Sub TimSort(arrInt() As Int, n As Int)
    Dim i As Int
    Dim lSize As Int
    Dim lBegin As Int
    Dim lEnd As Int
    Dim lMid As Int
    For i = 0 To n - 1 Step iConstRun
        TimInsertion(arrInt, i, Min(i + iConstRun-1, n - 1))   '<---- HERE, replace '31' by 'iConstRun-1'
        'i = i + iConstRun
    Next
    lSize = iConstRun
    Do While lSize<=n-1
    'For lSize = iConstRun To n - 1    '<-- In this case we don't use 'For' since the step is not constant
        
        For lBegin = 0 To n - 1 Step 2*lSize      '<-- should be done like this
            lEnd = Min(lBegin + 2 * lSize - 1, n - 1)
            lMid = lBegin + lSize - 1
            If lMid<lEnd Then                     '<-- Add this condition to prevent crashes when size is not a multiple of run.
                TimMerge(arrInt, lBegin, lMid, lEnd)
            End If
            'lBegin = lBegin + (2 * lSize)         '<-- incorrect
        Next
        Log("done")
        lSize = lSize * 2
    'Next
    Loop
End Sub

Sub TimInsertion(arrInt() As Int, lBegin As Int, lEnd As Int)
    Dim i As Int
    Dim j As Int
    Dim lTemp As Int
    For i = lBegin + 1 To lEnd
        lTemp = arrInt(i)
        j = i - 1
        Do While CheckWhile(j, lBegin, arrInt, lTemp)
            arrInt(j + 1) = arrInt(j)
            j = j - 1
        Loop
        arrInt(j + 1) = lTemp
    Next
End Sub

Sub CheckWhile(j As Int, lBegin As Int, arrInt() As Int, lTemp As Int) As Boolean
    If j < 0 Then
        Return False
    End If
    If j >= lBegin Then
        If arrInt(j) > lTemp Then
            Return True
        End If
    End If
    Return False
End Sub

Sub TimMerge(arrInt() As Int, lLeft As Int, lMid As Int, lRight As Int)
    Dim i As Int
    Dim j As Int
    Dim k As Int
    Dim Len1 As Int
    Dim Len2 As Int
    Len1 = lMid - lLeft + 1
    Len2 = lRight - lMid
    Dim arrBegin(Len1) As Int

    Dim arrEnd(Len2) As Int

    For i = 0 To Len1 - 1
        arrBegin(i) = arrInt(lLeft + i)
    Next
    For i = 0 To Len2 - 1
        arrEnd(i) = arrInt(lMid + 1 + i)
    Next
    i = 0
    j = 0
    k = lLeft
    Do While i < Len1 And j < Len2
        If arrBegin(i) <= arrEnd(j) Then
            arrInt(k) = arrBegin(i)
            i = i + 1
        Else
            arrInt(k) = arrEnd(j) '<<<<< goes wrong here, too low values go to the array!
            j = j + 1
        End If
        k = k + 1
    Loop
    Do While i < Len1
        arrInt(k) = arrBegin(i)
        k = k + 1
        i = i + 1
    Loop
    Do While j < Len2
        arrInt(k) = arrEnd(j)
        k = k + 1
        j = j + 1
    Loop
End Sub
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Thanks a lot for that, have tested and indeed it works fine.
What surprised me (considering the code was published on multiple sites) was that there was this bug in the C code that needed your added if:

B4X:
   If lMid < lEnd Then                     '<-- Add this condition to prevent crashes when size is not a multiple of run.
    TimMerge(arrInt, lBegin, lMid, lEnd)
   End If

I will convert you code to an index sort (leaving the original array untouched and just producing a sort index) and will post that code.
Thanks again, would have taken me a long time to figure this out without your assistance.

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
I will convert you code to an index sort (leaving the original array untouched and just producing a sort index) and will post that code.
Thanks again, would have taken me a long time to figure this out without your assistance.
RBS

This is the TimSort converted to an index sort:

B4X:
Sub TimSortInt_IDX(arrInt() As Int) As Int()
 
 Dim i As Int
 Dim n As Int
 Dim lSize As Int
 Dim lBegin As Int
 Dim lEnd As Int
 Dim lMid As Int
 Dim iConstRun As Int = 32
 Dim arrIndex(arrInt.Length) As Int
 
 n = arrInt.Length
 
 For i = 0 To arrInt.Length - 1
  arrIndex(i) = i
 Next
 
 For i = 0 To n - 1 Step iConstRun
  TimInsertionInt_IDX(arrInt, arrIndex, i, Min(i + iConstRun - 1, n - 1))   '<---- HERE, replace '31' by 'iConstRun-1'
 Next
 
 lSize = iConstRun
 
 Do While lSize <= n-1
  For lBegin = 0 To n - 1 Step 2 * lSize      '<-- should be done like this
   lEnd = Min(lBegin + 2 * lSize - 1, n - 1)
   lMid = lBegin + lSize - 1
   If lMid < lEnd Then                     '<-- Add this condition to prevent crashes when size is not a multiple of run.
    TimMergeInt_IDX(arrInt, arrIndex, lBegin, lMid, lEnd)
   End If
  Next
  lSize = lSize * 2
 Loop
 
 Return arrIndex
 
End Sub

Sub TimInsertionInt_IDX(arrInt() As Int, arrIndex() As Int, lBegin As Int, lEnd As Int)
 
 Dim i As Int
 Dim j As Int
 Dim lTemp As Int
 
 For i = lBegin + 1 To lEnd
  lTemp = arrInt(arrIndex(i))
  j = i - 1
  Do While CheckWhileInt(j, lBegin, arrInt, arrIndex, lTemp)
   arrIndex(j + 1) = arrIndex(j)
   j = j - 1
  Loop
  arrIndex(j + 1) = i
 Next
 
End Sub

Sub CheckWhileInt(j As Int, lBegin As Int, arrInt() As Int, arrIndex() As Int, lTemp As Int) As Boolean
 
 If j < 0 Then
  Return False
 End If
 
 If j >= lBegin Then
  If arrInt(arrIndex(j)) > lTemp Then
   Return True
  End If
 End If
 
 Return False
End Sub

Sub TimMergeInt_IDX(arrInt() As Int, arrIndex() As Int, lLeft As Int, lMid As Int, lRight As Int)

Dim i As Int
Dim j As Int
Dim k As Int
Dim Len1 As Int
Dim Len2 As Int
Len1 = lMid - lLeft + 1
Len2 = lRight - lMid
Dim arrBegin(Len1) As Int
 Dim arrEnd(Len2) As Int
 For i = 0 To Len1 - 1
arrBegin(i) = arrIndex(lLeft + i)
Next

For i = 0 To Len2 - 1
arrEnd(i) = arrIndex(lMid + 1 + i)
Next

i = 0
j = 0
k = lLeft

Do While i < Len1 And j < Len2
If arrInt(arrBegin(i)) <= arrInt(arrEnd(j)) Then
arrIndex(k) = arrBegin(i)
i = i + 1
Else
arrIndex(k) = arrEnd(j)
j = j + 1
End If
k = k + 1
 Loop
 
Do While i < Len1
arrIndex(k) = arrBegin(i)
k = k + 1
i = i + 1
Loop

Do While j < Len2
arrIndex(k) = arrEnd(j)
k = k + 1
j = j + 1
Loop

End Sub

This is all type specific for Int arrays, but I have 3 other sets for Double, Long and String, but didn't post these as it is easy
to convert. I can confirm that this sort is very fast indeed (100.000 string items in some 30 milli-secs) and that it is indeed
a stable sort. I use this now for faster sorting of the Flexible Table class and not seen a problem yet.

RBS
 
Upvote 0

JordiCP

Well-Known Member
Licensed User
I can confirm that this sort is very fast indeed (100.000 string items in some 30 milli-secs) and that it is indeed
a stable sort
Well done!
Perhaps it could be posted to Code Snippets (with [B4X] prefix, it seems to be platform-agnostic) to increase visibility :)
 
Upvote 0
Top