B4J Question [solved] jPOI and a docx document

udg

Expert
Licensed User
Longtime User
Hi all,

I'm using some modified old code by @Erel to "update" a docx document.
The base document sports 14 "user fields" (all of them of type Text). My code offers a simple interface to the user to fill up all or part of those variable fields, then writes a new document based on the original one (which is a fixed model).
Problem is that model_1 (one column page) works OK while model_2 (two columns page) skips a few fields outputting an ugly placeholder ("User field <name of the field> =").
Model_1 and Model_2 use exactly the same map for the user fields.

Here's the code I use:
B4X:
    Dim doc As JavaObject = OpenDocx(Dir, FileName)
    Dim paragraphs As List = doc.RunMethod("getParagraphs", Null)
    For Each p As JavaObject In paragraphs
        Dim runs As List = p.RunMethod("getRuns", Null)
        If runs.IsInitialized Then
            For Each r As JavaObject In runs
                Dim text As String = r.RunMethod("getText", Array(0))
                If text <> Null Then
                    Log(text)
                    For Each key As String In m1.Keys
                        'If text.Contains("$" & key & "$") Then    <<- this was from original code whcih made use of Excel as first step
                        If text.Contains(key) Then
                            Log(key)
                            r.RunMethod("setText", Array(m1.Get(key), 0))
                        End If
                    Next
                End If
            Next
        End If
    Next

Sub SaveDocument(doc As JavaObject, Dir As String, FileName As String)
    Dim out As OutputStream = File.OpenOutput(Dir, FileName, False)
    doc.RunMethod("write", Array(out))
    out.Close
End Sub

Sub OpenDocx(Dir As String, FileName As String) As JavaObject
    Dim in As InputStream = File.OpenInput(Dir, FileName)
    Dim document As JavaObject
    document.InitializeNewInstance("org.apache.poi.xwpf.usermodel.XWPFDocument", Array(in))
    Return document
End Su

I saw that it exists a commit() function in the definition of XWPFDocument, but the document returned by OpenDocX seems to be of type POIXMLDocument so that function is unavaible (and anyway I don't know if it could solve the problem).

Any ideas or suggestions?
 

DonManfred

Expert
Licensed User
Longtime User
Upvote 0

udg

Expert
Licensed User
Longtime User
What is the output of Log(GetType(doc))
For both model docs:
org.apache.poi.xwpf.usermodel.XWPFDocument

So, why it raises an error for
B4X:
doc.RunMethod("commit", Null)
java.lang.RuntimeException: Method: commit not found in: org.apache.poi.xwpf.usermodel.XWPFDocument

BTW, I don't know whether commit() could be a solution.

@DonManfred : it's planned, but probably the problem is somewhere else.
 
Upvote 0

udg

Expert
Licensed User
Longtime User
You'right. I missed the protected attribute.
Anyway I doubt the problem arises from a missing commit since probably function write calls it internally. And the 1-column doc works OK..
I tried to originate the failing doc both using MS Word and LibreOffice's odt format then saving as docx. The behaviour stays the same.

I am doing this for a friend so my motivation on the topic is learning from this experience, undesrtand what happens and why.
 
Upvote 0

udg

Expert
Licensed User
Longtime User
Just a quick update.
I made a "twin" document based on TABs instead od the two-columns style the original doc had.
On this new kind of document I tried both the "user defined variables" and a simple placeholder ($varname$) settings.
Unfortunatley the problem perists, but thanks to a couple of logs I found that it relates to runs and not to paragaphs. There a few link on the Internet about the "errant" (or unpredictable) behaviour of runs.

Now, my next attempt will be to try to fool the logic behind runs: I will change Font before and after the run containing the placeholder, hoping the engine wil keep what I care about together. I'll keep you updated (and publish here full code and a sample doc to serve as an evenatula base for others).

Have a noice weekend.
 
Upvote 0

udg

Expert
Licensed User
Longtime User
And the winner is.... udg!
Following the idea outlined in the previous post, I finally could set up a docx document that rendered as expected.
The key point was to "isolate" each run containing a placeholder by preceding/following it with a font change (which in my case had no impact since it corresponds to TABs)
So it is like this:
Font1 - TAB - Font2 - placeholder ($fieldname$) - Font1 - TAB (this one moves cursor on the second half of the page) - Font2 - placeholder

What I learnt from this experience is:
- runs are a bit unreliable (the engine could split a placeholder $name$ as $ - name$ or any other combination of it)
- using the "user defined variables" feature is ok, but in many cases a simple placeholder set in the text is simpler and more immediate
- it's better to avoid the two-column scheme; using TABs to place text on the "second" column is easier and reliable

The code I used for my final attempt is the same one Erel's originally published, as showed in post#1. To understand what was going on I used a couple of Logs to see the text for a paragraph and the size of a run. That, along what i read on the Internet, gave me the idea to try to isolate each run in some way. I used a Font change, but probably an attribute like bold/italic or whatever would do the same. What seems to work is that the engine needs to see your placeholder as something different from its sorroundings, so it keeps it together avoiding any kind of splitting.
 
Last edited:
Upvote 0
Top