Android Question How to de-compress XD-files?

Syd Wright

Well-Known Member
Licensed User
Hello Don,
Thank you very much for your help. Indeed I keep running into OOM crashes with the Bremen map in your example.
The files I am trying to read are smaller (less than 1 MB), but so far I have not been able to de-compress my "unzip1.xz" file (attached).
I have derived this file from a (Kiwix) ZIM file because I am trying to make an offline reader for Kiwix Wikipedia data.
The attached file opens fine with 7-zip! In your example I get an "org.tukaani.xz.CorruptedInputException: Invalid LZMA properties byte" error. Could it have to do with the fact that Kiwix uses LZMA2 compression? Any ideas?

PS: please change the ".zip" extention into ".xz" in my file.
 

Attachments

DonManfred

Expert
Licensed User
Could it have to do with the fact that Kiwix uses LZMA2 compression?
maybe. there are examples of LZMA only which i wrapped into two methods (Encode and Decode).
I could not found any LZMA2 Example or lzma2 encryption module in xz-1.8.jar
Stay tuned. I found a lzma2 too...
 
Last edited:

DonManfred

Expert
Licensed User
I found a lzma2 too...
Unfortunately it seems not to do anything using the dictsize i used.

in the sources i found this:
/**
* Creates a new input stream that decompresses raw LZMA2 data
* from <code>in</code>.
* <p>
* The caller needs to know the dictionary size used when compressing;
* the dictionary size isn't stored as part of a raw LZMA2 stream.
* <p>
* Specifying a too small dictionary size will prevent decompressing
* the stream. Specifying a too big dictionary is waste of memory but
* decompression will work.
* <p>
* There is no need to specify a dictionary bigger than
* the uncompressed size of the data even if a bigger dictionary
* was used when compressing. If you know the uncompressed size
* of the data, this might allow saving some memory.
*
* @param in input stream from which LZMA2-compressed
* data is read
*
* @param dictSize LZMA2 dictionary size as bytes, must be
* in the range [<code>DICT_SIZE_MIN</code>,
* <code>DICT_SIZE_MAX</code>]
So, when you do not know the distsize used when compressing this file. Play around with it.
 

Attachments

Syd Wright

Well-Known Member
Licensed User
Unfortunately it seems not to do anything using the dictsize i used.

in the sources i found this:

So, when you do not know the distsize used when compressing this file. Play around with it.
Hi Don,
Thank you again for taking the time to try and help me. So far no luck with experimenting with the distsize parameter.
The code below gives you some idea of my experiments. I either get an Error or the output file is 0 bytes:
B4X:
'Activity module
Sub Process_Globals
End Sub

Sub Globals
    Dim XZ1 As LZMA                'Shouldn't there be a "LZMA2" object?
End Sub

Sub Activity_Create(FirstTime As Boolean)
    Dim DictSize1 As Int
    XZ1.Initialize("XZ1")
    File.Copy(File.DirAssets,"unzip3.xz",File.DirInternalCache,"unzip3.xz")
    If File.Exists(File.DirRootExternal,"UnzipXZ3.txt") Then File.Delete(File.DirRootExternal,"UnzipXZ3.txt")
    'MY EXPERIMENTS:
    'DictSize1 = 38              'tried 1, 2, 3 to 40: Ref: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm    Gives error: "Unsupported dictionary size
    'DictSize1 = 2048            'tried 2^1 to 2^11:   Error: "Unsupported dictionary size
    'DictSize1 = 65536 '4096     'tried 2^12 to 2^16:  NO error, but output file = 0 bytes
    'DictSize1 = 1048576         'tried 2^20 (close to expected unzipped size) NO error, but output file = 0 bytes
    DictSize1 = 8388608          'tried 2^23: Ref: https://tukaani.org/xz/xz-javadoc/org/tukaani/xz/LZMA2Options.html: NO error, but output file = 0 bytes
    'DictSize1 = File.Size(File.DirInternalCache,"unzip3.xz")  'NO error, but output file = 0 bytes
    XZ1.DecodeLZMA2(File.DirInternalCache,"unzip3.xz",File.DirRootExternal,"UnzipXZ3.txt",DictSize1)
    Log(File.Size(File.DirRootExternal,"UnzipXZ3.txt"))
End Sub
Ref: https://www.systutorials.com/docs/linux/man/1-xz/ for disc sizes. My guess is that 2^12 (=262144) is a good choice. Therefore I assume that the 0 bytes output problem has another cause.
 
Last edited:

Syd Wright

Well-Known Member
Licensed User
Thank you very much. Yes I also suspected that. I tried to find where the dictsize byte is located in the header (after decrypting my file with p7zip), but so far the online documentation about the XZ header appears insuffient.

What I could do is simply run your code many thousands of times, each time incrementing the dictsize, until a valid output appears... but this could take days. Nevertheless I am going to try that now.

I also don't know if the dictsize would be the same for the other ZIM files used by Kiwix...
At least it is clear that the dictsize is not an exponent of 2 (I tried them all).
 
Last edited:

Syd Wright

Well-Known Member
Licensed User
I have now tried every dictsize from 4.096 to 2.500.000 but no success.
Here is my code:
B4X:
Sub Activity_Create(FirstTime As Boolean)
    Dim DictSize1 As Int
    Dim Bestand1 As String
   
    XZ1.Initialize("XZ1")
    Bestand1 = "unzip3.xz"
    File.Copy(File.DirAssets,Bestand1,File.DirInternalCache,Bestand1)
    If File.Exists(File.DirRootExternal,"UnzipXZ1.txt") Then
      Log("Before: " & File.Size(File.DirRootExternal,"UnzipXZ1.txt"))
      File.Delete (File.DirRootExternal,"UnzipXZ1.txt")
    End If
   
    'MY EXPERIMENTS:
    DictSize1 = 4096            'Smallest size without getting an Error
    Do While DictSize1 < 2500000
      If File.Exists(File.DirRootExternal,"UnzipXZ1.txt") Then File.Delete (File.DirRootExternal,"UnzipXZ1.txt")
      XZ1.DecodeLZMA2(File.DirInternalCache,Bestand1,File.DirRootExternal,"UnzipXZ1.txt",DictSize1)
      Sleep(100)                'Just in case decoder needs some time to decode
      Log("Dictsize1 = " & DictSize1 & ", Filesize= " & File.Size(File.DirRootExternal,"UnzipXZ1.txt"))
      If File.Size(File.DirRootExternal,"UnzipXZ1.txt")>0 Then Exit
      DictSize1 = DictSize1 + 1
      'DictSize1 = DictSize1 * 2
    Loop
End Sub
7-zip has no problem with reading my XZ file. Maybe the XZ de-compression part of 7-zip could be wrapped (?)
There are also Java solutions available: https://tukaani.org/xz/java.html
It's probably time I start to learn Java and how to make wrappers ...
 

Syd Wright

Well-Known Member
Licensed User
OK, I thought that might be the case. Strange that none of the dictsize values that I tried work (nor generate an error). Could it be that I did not wait long enough for results to appear? I used a delay of 100 msec between each increment of dictsize. What I read on various websites is that normally the dictsize is a power of 2, (e.g. 2^12).
I assume nobody has any further ideas or alternatives (?)
 
Last edited:

DonManfred

Expert
Licensed User
Could it be that I did not wait long enough for results to appear?
Don´t ask me; ask the author of XZ Github project ;-)
i just used the demo source to build some methods out of it.
B4X:
package org.tukaani.xz;

/*
 * LZMADecDemo
 *
 * Author: Lasse Collin <lasse.collin@tukaani.org>
 *
 * This file has been put into the public domain.
 * You can do whatever you want with this file.
 */

import java.io.*;
import org.tukaani.xz.*;

/**
 * Decompresses .lzma files to standard output. If no arguments are given,
 * reads from standard input.
 *
 * NOTE: For most purposes, .lzma is a legacy format and usually you should
 * use .xz instead.
 */
class LZMADecDemo {
    public static void main(String[] args) {
        byte[] buf = new byte[8192];
        String name = null;

        try {
            if (args.length == 0) {
                name = "standard input";

                // No need to use BufferedInputStream with System.in which
                // seems to be fast with one-byte reads.
                InputStream in = new LZMAInputStream(System.in);

                int size;
                while ((size = in.read(buf)) != -1)
                    System.out.write(buf, 0, size);

            } else {
                // Read from files given on the command line.
                for (int i = 0; i < args.length; ++i) {
                    name = args[i];
                    InputStream in = new FileInputStream(name);

                    try {
                        // In contrast to other classes in org.tukaani.xz,
                        // LZMAInputStream doesn't do buffering internally
                        // and reads one byte at a time. BufferedInputStream
                        // gives a huge performance improvement here but even
                        // then it's slower than the other input streams from
                        // org.tukaani.xz.
                        in = new BufferedInputStream(in);
                        in = new LZMAInputStream(in);

                        int size;
                        while ((size = in.read(buf)) != -1)
                            System.out.write(buf, 0, size);

                    } finally {
                        // Close FileInputStream (directly or indirectly
                        // via LZMAInputStream, it doesn't matter).
                        in.close();
                    }
                }
            }
        } catch (FileNotFoundException e) {
            System.err.println("LZMADecDemo: Cannot open " + name + ": "
                               + e.getMessage());
            System.exit(1);

        } catch (EOFException e) {
            System.err.println("LZMADecDemo: Unexpected end of input on "
                               + name);
            System.exit(1);

        } catch (IOException e) {
            System.err.println("LZMADecDemo: Error decompressing from "
                               + name + ": " + e.getMessage());
            System.exit(1);
        }
    }
}
it results in my method

B4X:
    public void DecodeLZMA(final String srcpath, final String srcfilename, final String dstpath, final String dstfilename) {
        new Thread(new Runnable() {
      @Override
      public void run() {
          //BA.Log("dummy()");

            InputStreamWrapper inw;
            OutputStreamWrapper outw;
                try {
                    inw = File.OpenInput(srcpath,srcfilename);
                InputStream in;
                in = new LZMAInputStream(inw.getObject());
                    outw = File.OpenOutput(dstpath, dstfilename, false);
                OutputStream out = outw.getObject();
            int size;
            int total = 0;
            while ((size = in.read(buf)) != -1){
                total = total + size;
              BA.Log("Decoding Bytes "+size+" / "+total);
                out.write(buf, 0, size);
            }
                //System.out.write(buf, 0, size);
            in.close();
            out.flush();
            BA.Log("Finish decoding ");
            final String method = "decode";
            ba.raiseEventFromDifferentThread(this, null, 0, eventName + "_finish", true, new Object[] {method, dstpath, dstfilename});
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }

                //ba.raiseEventFromDifferentThread(this, null, 0, eventName + "_dummy", true, new Object[] {result});
      }
        }).start();
    }
and after you said something about the lzma2 i build a second one
B4X:
    public void DecodeLZMA2(final String srcpath, final String srcfilename, final String dstpath, final String dstfilename, final int dictSize) {
        new Thread(new Runnable() {
      @Override
      public void run() {
          BA.Log("DecodeLZMA2()");

            InputStreamWrapper inw;
            OutputStreamWrapper outw;
                try {
                    inw = File.OpenInput(srcpath,srcfilename);
                InputStream in;
                in = new LZMA2InputStream(inw.getObject(),dictSize);
                    outw = File.OpenOutput(dstpath, dstfilename, false);
                OutputStream out = outw.getObject();
            int size;
            int total = 0;
            while ((size = in.read(buf)) != -1){
                total = total + size;
              BA.Log("Decoding Bytes "+size+" / "+total);
                out.write(buf, 0, size);
            }
                //System.out.write(buf, 0, size);
            in.close();
            out.flush();
            BA.Log("Finish decoding ");
            final String method = "decode";
            ba.raiseEventFromDifferentThread(this, null, 0, eventName + "_finish", true, new Object[] {method, dstpath, dstfilename});
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }

                //ba.raiseEventFromDifferentThread(this, null, 0, eventName + "_dummy", true, new Object[] {result});
      }
        }).start();
    }
 

DonManfred

Expert
Licensed User
Here is my code
With v0.03 you get an success false event if an exception is raised.
so you can use something like this (without need for any sleep).

B4X:
        DictSize1 = 1024*1024*2            'Smallest size without getting an Error
        Do While DictSize1 > 1000000
            If File.Exists(File.DirRootExternal,"UnzipXZ1.txt") Then File.Delete (File.DirRootExternal,"UnzipXZ1.txt")
            xz.DecodeLZMA2(File.DirRootExternal,fname,File.DirRootExternal,"UnzipXZ1.txt",DictSize1)
            wait for XZ_Finish(success As Boolean, method As String, path As String, filename As String)
            If success = False Then
                DictSize1 = DictSize1 - 1
                Log("Dictsize1 = " & DictSize1 & ", Success = false")
            Else
                Log("Dictsize1 = " & DictSize1 & ", Filesize= " & File.Size(File.DirRootExternal,"UnzipXZ1.txt"))
            End If
               
            'XZ1.DecodeLZMA2(File.DirInternalCache,Bestand1,File.DirRootExternal,"UnzipXZ1.txt",DictSize1)
            'Sleep(25)                'Just in case decoder needs some time to decode
            If File.Size(File.DirRootExternal,"UnzipXZ1.txt")>0 Then Exit
            'DictSize1 = DictSize1 * 2
        Loop
Please note the new signature of the Eventsub
B4X:
XZ_Finish(success As Boolean, method As String, path As String, filename As String)
 

Attachments

Syd Wright

Well-Known Member
Licensed User
Don, thank you for trying again to find a solution to read XZ files. Last night and today I ran all discsize values from 4096 to 2 million. Regrettably, all return "False". I will search further to see if there is any Java code that does not require the discsize parameter. 7zip does not require this parameter, so other solutions should be possible.
I also asked Informatix for help because his Archiver (based on 7zip) is capable of encoding to XZ format, but decoding seems to be very complicated...
 

Syd Wright

Well-Known Member
Licensed User
i got an error telling "Error, end of file reached" with 7zip and your file when trying to decompress. I was not able to decompress it without an error.
Indeed there is 1 byte missing in the footer. Attached is the file that I keep using to do the tests with 7zip and your wrappers:
 

Attachments

Top