Although it is possible to first evaluate the header of a file and possibly search for non-ASCII characters in the following bytes, a flawless procedure is searched for in order to distinguish between text and binary with certainty.
Does anyone have a suitable recognition function? A fast response time is thereby important also with file sizes around 200MBytes.
B4X:
Dim ist As InputStream = File.OpenInput("", strPathFile)
Dim buffer(64) As Byte
ist.ReadBytes(buffer, 0, buffer.Length )
ist.Close
Dim sb As StringBuilder : sb.Initialize
For i=0 To buffer.Length-2 '!
If buffer(i) <0 Then
sb.Append(Chr(buffer(i) +256))
Else
sb.Append(Chr(buffer(i)))
End If
Next
Dim strHeader As String = sb.ToString
Select Case True
Case strHeader.Contains("ftypmp42")
Return "MPEG"
Case strHeader.Contains("<!DOCTYPE html>")
Return "HTML"
Case strHeader.StartsWith("{")
Return "JSON"
Case strHeader.StartsWith("%PDF-")
Return "PDF Document"
Case strHeader.StartsWith("SQLite")
Return "DATABASE SQLite"
Case strHeader.StartsWith(Chr(255) & Chr(216) & Chr(255))
Return "IMAGE GIF"
Case strHeader.StartsWith("GIF")
Return "IMAGE GIF"
Case strHeader.StartsWith(Chr(137) & "PNG")
Return "IMAGE PNG"
Case strHeader.StartsWith("BM")
Return "IMAGE BMP"
Case strHeader.StartsWith("RIFF")
If strHeader.Contains("WEBP") Then
Return "IMAGE WebP"
Else
Return "RIFF"
End If
Case strHeader.StartsWith("PK")
Return "ARCHIVE gen"
Case strHeader.StartsWith(Chr(31) & Chr(157))
Return "ARCHIVE TAR"
Case strHeader.StartsWith(Chr(31) & Chr(160))
Return "ARCHIVE TAR"
Case strHeader.StartsWith(Chr(31) & Chr(139) & Chr(8))
Return "ARCHIVE Gzip"
Case else
return .....
End Select
Does anyone have a suitable recognition function? A fast response time is thereby important also with file sizes around 200MBytes.