Android Question Is there a safe way to determine if a stream or file is of type TEXT or BINARY?

fredo

Well-Known Member
Licensed User
Longtime User
Although it is possible to first evaluate the header of a file and possibly search for non-ASCII characters in the following bytes, a flawless procedure is searched for in order to distinguish between text and binary with certainty.

B4X:
    Dim ist As InputStream =  File.OpenInput("", strPathFile)
    Dim buffer(64) As Byte
    ist.ReadBytes(buffer, 0, buffer.Length )
    ist.Close
    Dim sb As StringBuilder : sb.Initialize
    For i=0 To buffer.Length-2 '!
        If buffer(i) <0 Then
            sb.Append(Chr(buffer(i) +256))
        Else
            sb.Append(Chr(buffer(i)))
        End If
    Next
    Dim strHeader As String = sb.ToString
    Select Case True
        Case strHeader.Contains("ftypmp42") 
            Return "MPEG"
            
        Case strHeader.Contains("<!DOCTYPE html>")
            Return "HTML"
            
        Case strHeader.StartsWith("{")
            Return "JSON"
            
        Case strHeader.StartsWith("%PDF-")
            Return "PDF Document"
            
        Case strHeader.StartsWith("SQLite")
            Return "DATABASE SQLite"
            
        Case strHeader.StartsWith(Chr(255) & Chr(216) & Chr(255))
            Return "IMAGE GIF"
    
        Case strHeader.StartsWith("GIF")
            Return "IMAGE GIF"
    
        Case strHeader.StartsWith(Chr(137) & "PNG")
            Return "IMAGE PNG"
    
        Case strHeader.StartsWith("BM")
            Return "IMAGE BMP"
        
        Case strHeader.StartsWith("RIFF")
            If strHeader.Contains("WEBP") Then
                Return "IMAGE WebP"
            Else
                Return "RIFF"
            End If

        Case strHeader.StartsWith("PK")
            Return "ARCHIVE gen"
        
        Case strHeader.StartsWith(Chr(31) & Chr(157))
            Return "ARCHIVE TAR"
        
        Case strHeader.StartsWith(Chr(31) & Chr(160))
            Return "ARCHIVE TAR"
        
        Case strHeader.StartsWith(Chr(31) & Chr(139) & Chr(8))
            Return "ARCHIVE Gzip"
         
         Case else
             return .....
    End Select

Does anyone have a suitable recognition function? A fast response time is thereby important also with file sizes around 200MBytes.
 

moster67

Expert
Licensed User
Longtime User
No, I don't think so. It is not easy. See this article:
https://dzone.com/articles/determining-file-types-java

I tried using 2 methods from the URLConnection object and results varied. Anyway, this gives you just the Mime-types so you still need to associate the result to binary or text. Sample code here (attached the project too):

B4X:
Sub Process_Globals
    'These global variables will be declared once when the application starts.
    'These variables can be accessed from all modules.
    Private NativeMe As JavaObject
End Sub

Sub Globals
    'These global variables will be redeclared each time the activity is created.
    'These variables can only be accessed from this module.

End Sub

Sub Activity_Create(FirstTime As Boolean)
    'Do not forget to load the layout file created with the visual designer. For example:
    'Activity.LoadLayout("Layout1")
 
    If FirstTime Then
        NativeMe.InitializeContext
    End If
 
    If File.Exists(File.DirAssets,"mm.jpg") Then
        File.Copy(File.DirAssets,"mm.jpg",File.DirInternal,"mm.jpg")
    End If
 
    If File.Exists(File.DirAssets,"mypdf.pdf") Then
        File.Copy(File.DirAssets,"mypdf.pdf",File.DirInternal,"mypdf.pdf")
    End If
 
    If File.Exists(File.DirAssets,"mypdf.zip") Then
        File.Copy(File.DirAssets,"mypdf.zip",File.DirInternal,"mypdf.zip")
    End If
 

    Dim MyFile As Object = File.Combine(File.DirInternal,"mm.jpg")
    Dim InpStr As InputStream = File.OpenInput(File.DirInternal,"mypdf.pdf")
    Dim s As String = NativeMe.RunMethod("GetMimeType", Array(MyFile, True))
    Log(s)
    Dim t As String = NativeMe.RunMethod("GetMimeType", Array(InpStr, False))
    Log(t)
    Dim MyFile As Object = File.Combine(File.DirInternal,"mypdf.pdf")
    Dim InpStr As InputStream = File.OpenInput(File.DirInternal,"mm.jpg")
    Dim s As String = NativeMe.RunMethod("GetMimeType", Array(MyFile, True))
    Log(s)
    Dim t As String = NativeMe.RunMethod("GetMimeType", Array(InpStr, False))
    Log(t)
 
    Dim MyFile As Object = File.Combine(File.DirInternal,"mypdf.zip")
    Dim InpStr As InputStream = File.OpenInput(File.DirInternal,"mypdf.zip")
    Dim s As String = NativeMe.RunMethod("GetMimeType", Array(MyFile, True))
    Log(s)
    Dim t As String = NativeMe.RunMethod("GetMimeType", Array(InpStr, False))
    Log(t)
 
 
    File.Delete(File.DirInternal,"mm.jpg")
    File.Delete(File.DirInternal,"mypdf.pdf")
    File.Delete(File.DirInternal,"mypdf.zip")
 
 
End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub



#If JAVA
import java.io.IOException;
import java.io.InputStream;
import java.net.URLConnection;
import anywheresoftware.b4a.BA;

    public String GetMimeType(Object input, boolean isFile) throws IOException {
        String mimeType = "N/A";
        try {
            if (isFile) {
                mimeType = URLConnection.guessContentTypeFromName((String) input);
                BA.Log("File");
            } else {
                mimeType = URLConnection.guessContentTypeFromStream((InputStream)input);
                BA.Log("Stream");
            }
        } catch(Exception e) {
            BA.Log("error");
            return "N/A";
        }

        if (mimeType == null){
            return "N/A";
        } else {
            return mimeType;
        }
    }
#End If

Log Output:
B4X:
File
image/jpeg
Stream
N/A
File
application/pdf
Stream
image/jpeg
File
application/zip
Stream
N/A

You might wanna have a look at @stevel05 post here for some nice code regarding images..
 

Attachments

  • Det.zip
    175.2 KB · Views: 117
Last edited:
Upvote 0
Top