B4J Question [B4J Library jPOI] High usage of RAM memory - +1.5GB to generate 70MB xlsx file

rtek1000

Active Member
Licensed User
Longtime User
Hi,

b4j_mem.png


To generate an xlsx file of only 70MB memory usage is more than 1.5GB of RAM memory.

And each time the operation is performed, this memory usage increases.

To be able to perform the operation again, it is necessary to close the application and open it again. Otherwise performance drops dramatically.

If the writing lines of the sheet (sheet1.PutString / sheet1.PutNumber) are commented out the app always uses about 270MB of RAM memory.

I'm using a computer with a few GB of RAM, but this app won't work on a computer with low RAM.

I thought about saving the file a few times instead of saving it just once. But it doesn't seem that even after saving the memory is not freed.

Every time the PUT functions (sheet1.PutString / sheet1.PutNumber) access the sheet, the RAM usage increases. 🤔

XSSF does use a lot of memory to construct an in memory representation of the
Excel worksheet. This question has been asked a number of times on the list
and there is a solution available. If you are limited for memory, then may I
suggest that you take a look at the streaming api - SXSSF.

Source: https://user.poi.apache.narkive.com...it-uses-too-ram-is-it-a-limit-of-your-library


Using SXSSF poi 3.8

B4X:
package example;

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class SXSSFexample {


    public static void main(String[] args) throws Throwable {
        FileInputStream inputStream = new FileInputStream("mytemplate.xlsx");
        XSSFWorkbook wb_template = new XSSFWorkbook(inputStream);
        inputStream.close();

        SXSSFWorkbook wb = new SXSSFWorkbook(wb_template);
        wb.setCompressTempFiles(true);

        SXSSFSheet sh = (SXSSFSheet) wb.getSheetAt(0);
        sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
    for(int rownum = 4; rownum < 100000; rownum++){
        Row row = sh.createRow(rownum);
        for(int cellnum = 0; cellnum < 10; cellnum++){
            Cell cell = row.createCell(cellnum);
            String address = new CellReference(cell).formatAsString();
            cell.setCellValue(address);
        }

    }


    FileOutputStream out = new FileOutputStream("tempsxssf.xlsx");
    wb.write(out);
    out.close();
}

}

Source: https://stackoverflow.com/questions/7274076/writing-a-large-resultset-to-an-excel-file-using-poi

Is there any example of generating the xlsx file using the streaming API?
 

rtek1000

Active Member
Licensed User
Longtime User
SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

 
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
jPOI isn't using SXSSF as this format has many limitations.


Can you make a small app that reproduces it? This sounds like a memory leak.

Yes, of course, there's no secret.

I put 20,000 lines in the example (lines_max), because that's the amount of memory my computer can hold at the moment, because I'm using the browser.

You can use a different amount of lines to test.

But when it's more unoccupied, it manages to make about 50 thousand lines, which is the amount of data that the datalogger must generate.

To run this example, leave the windows task manager open, and click on the memory column, so it's the most used at the top, so it's easier to keep track.

Once the task is complete, try the task again without closing the app. Memory usage should continue to increase.

B4X:
Sub createData
    XL.Initialize
    Dim Workbook As XLWorkbookWriter = XL.CreateWriterBlank
    Dim sheet1 As XLSheetWriter = Workbook.CreateSheetWriterByName("Sheet1")

    For i = 0 To lines_max
        For j = 0 To 70
            sheet1.PutString(XL.AddressZero(j, i), "Product #" & i)
        Next
        
        CallSubDelayed2(Me, "val_update", i)
    Next

    'save the workbook
    Dim f As String = Workbook.SaveAs(File.DirApp, "Products.xlsx", True)
    'Wait For (XL.OpenExcel(f)) Complete (Success As Boolean)
'    StopMessageLoop 'non-ui

    CallSubDelayed(Me, "msg")

    Button1.Enabled = True
End Sub
 

Attachments

  • XLExample.zip
    2.9 KB · Views: 40
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
I thought of making a secondary app that is started by the main app and then terminated when the task is finished, like a command line app, this way the RAM memory can be freed after the task is completed. 🤔
 
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
I thought of making a secondary app that is started by the main app and then terminated when the task is finished, like a command line app, this way the RAM memory can be freed after the task is completed. 🤔

B4J_XLUtils.png


I tried to create a console app but it was not possible, should this library XLUtils be conditioned to be used only in non-console apps?
 
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
View attachment 123117

I tried to create a console app but it was not possible, should this library XLUtils be conditioned to be used only in non-console apps?
I haven't reviewed all of the library code (XLUtils.b4xlib) yet but it looks like only a few xui references were used to get the color values. But I found it interesting that other values were written by hand, I wonder why they reused the values from the xui library.

B4j_xui.jpg
 
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
I haven't reviewed all of the library code (XLUtils.b4xlib) yet but it looks like only a few xui references were used to get the color values. But I found it interesting that other values were written by hand, I wonder why they reused the values from the xui library.

Ok, I managed to delete xui's references from the library, the library is attached. If used I suggest that the version is changed, I don't know how to edit it yet, I just edited the internal file. The console app did not return errors after deleting xui references.

I used this simple code to read the values:
B4X:
Sub print_colors
    Private ColorsMap As Map
    
    ColorsMap = CreateMap("black": xui.Color_Black, _
        "darkgray": xui.Color_DarkGray, _
        "gray": xui.Color_LightGray, _
        "white": xui.Color_White, _
        "red": xui.Color_Red, _
        "green": xui.Color_Green, _
        "blue": xui.Color_Blue, _
        "yellow": xui.Color_Yellow, _
        "cyan": xui.Color_Cyan, _
        "magenta": xui.Color_Magenta)
        
    For i = 0 To (ColorsMap.Size - 1)
        Log(ColorsMap.GetKeyAt(i) & ":0x" & Bit.ToHexString(ColorsMap.GetValueAt(i)))
    Next
End Sub
 

Attachments

  • XLUtils.b4xlib
    24.7 KB · Views: 35
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
I found this reference:

Using Auto Flush
In the below class, the method writes to a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.
Using Manual Flush
The below method turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk.
SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value. For example, for a 20 MB csv data the size of the temp XML becomes more than a gigabyte.


So the solution to avoid high RAM memory usage is to access a few cells at a time.

Interesting that the 70MB xlsx file generated by XLUtils (and which used those 1.5GB), when opened in the WPS Office editor the memory usage is only 160MB.

I imagine these editors only access the cells that appear in the window. Perhaps this explains why it takes so long to refresh when scrolling the screen when the zoom is set to show maximum cells.
 
Upvote 0

rtek1000

Active Member
Licensed User
Longtime User
I noticed that free memory was usually 0.3GB to 0.5GB during the first export, and by the end of the second export it dropped to less than 0.005GB. With that I made a protection to abort the operation and try to guide the user to do the procedure in another way. It's not really elegant, but it's what I have in my hands at the moment.

B4X:
Dim robot As AWTRobot

Dim RAMmemory As Double : RAMmemory = robot.JVMMemoryFree / 1073741824
    
If RAMmemory < 0.0075 Then
        
    If lowRAMmemory = False Then
        xui.MsgboxAsync("Try closing other programs to free memory, if the problem still persists, try exporting fewer channels at a time.", "Low free RAM memory")
    End If
        
    lowRAMmemory = True
        
End If
 
Upvote 0
Top