B4J Question How to extract images from PDF.

T201016

Active Member
Licensed User
Longtime User
Hi,
I don't know if I modified the sample project well,
I receive the following error in the compilation:

B4J Version: 10.00
Code parsing. (0.00s)
Java version: 8
Building Folders Structure. (0.02s)
Code compilation. (0.00s)

Obfuscatormap.txt file created in the Objects folder.
Compilation of the system code. (0.00s)
Organizing libraries. (0.00s)
Compilation of the generated Java code. Error
B4J LINE: 12
End sub
Javac 1.8.0_441
SRC \ B4J \ Example \ Main.java: 126: Error: <Identifier> Expected
Public Void Class Saveimagesinpdf EXTENDS PDFSTReameNGINE
^
1 error

A tip is welcome, which is wrong in the code :confused:

Example::
'Non-UI application (console / server application)
#Region Project Attributes
    #CommandLineArgs:
    #MergeLibraries: True
#End Region

    #AdditionalJar: pdfbox-app-2.0.26
'    download https://www.apache.org/dyn/closer.lua/pdfbox/2.0.26/pdfbox-app-2.0.26.jar

Sub Process_Globals
    Private jo As JavaObject   
End Sub

Sub AppStart (Args() As String)
    
    jo = Me
    
'    Dim pathPDF As String = "D:\\TMP\\UE.pdf"
    
'    jo.RunMethod("SaveImagesInPdf", Array As Object(pathPDF))
    jo.RunMethod("SaveImagesInPdf", Null)

End Sub

#if java
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.PDFStreamEngine;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.List;

import javax.imageio.ImageIO;

/**
 * This is an example on how to extract images from pdf.
 */
public void class SaveImagesInPdf extends PDFStreamEngine
{
    /**
     * Default constructor.
     *
     * @throws IOException If there is an error loading text stripper properties.
     */
    public SaveImagesInPdf() throws IOException
    {
    }

    public int imageNumber = 1;

    /**
     * @param args The command line arguments.
     *
     * @throws IOException If there is an error parsing the document.
     */
    public void main( String[] args ) throws IOException
    {
        PDDocument document = null;
        String fileName = "D:\\TMP\\UE.pdf";
        try
        {
            document = PDDocument.load( new File(fileName) );
            SaveImagesInPdf printer = new SaveImagesInPdf();
            int pageNum = 0;
            for( PDPage page : document.getPages() )
            {
                pageNum++;
                System.out.println( "Processing page: " + pageNum );
                printer.processPage(page);
            }
        }
        finally
        {
            if( document != null )
            {
                document.close();
            }
        }
    }

    /**
     * @param operator The operation to perform.
     * @param operands The list of arguments.
     *
     * @throws IOException If there is an error processing the operation.
     */
    @Override
    protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
    {
        String operation = operator.getName();
        if( "Do".equals(operation) )
        {
            COSName objectName = (COSName) operands.get( 0 );
            PDXObject xobject = getResources().getXObject( objectName );
            if( xobject instanceof PDImageXObject)
            {
                PDImageXObject image = (PDImageXObject)xobject;

                // same image to local
                BufferedImage bImage = image.getImage();
                ImageIO.write(bImage,"PNG",new File("image_"+imageNumber+".png"));
                System.out.println("Image saved.");
                imageNumber++;

            }
            else if(xobject instanceof PDFormXObject)
            {
                PDFormXObject form = (PDFormXObject)xobject;
                showForm(form);
            }
        }
        else
        {
            super.processOperator(operator, operands);
        }
    }

}
#End If
 

Attachments

  • Save Imagess.zip
    1.9 KB · Views: 70
  • UE.pdf
    189.7 KB · Views: 81
Last edited:
Solution
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

B4X:
Sub AppStart (Args() As String)
 
    Dim Source As String = "D:\UE.pdf"
    Dim Destination As String = "D:\"

    Dim F As JavaObject
    F.InitializeNewInstance("java.io.File",Array(Source))

    Dim Document As JavaObject
    Document.InitializeStatic("org.apache.pdfbox.pdmodel.PDDocument")
 
    Dim Doc As JavaObject = Document.RunMethod("load",Array(F))
    Dim PageTree As JavaObject = Doc.RunMethodJO("getDocumentCatalog",Null).RunMethod("getPages",Null)
 
    Dim TotalImages As Int = 1
 
    Dim Iterator As JavaObject = PageTree.RunMethod("iterator",Null)
 
    Do While Iterator.RunMethod("hasNext",Null)
        Dim Page As JavaObject...

stevel05

Expert
Licensed User
Longtime User
I haven't looked at your code yet, but the first thing that strikes me is that the identifier it is complaining about has a capital P. Should probably be public. I can't see that in the code you've listed but worth a look to see if it's there.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I've got it compiling and loading a static class, but there is a lot that needs sorting out to get it to work. It's built as an app so you'd need to replace the main procedure for a start. Where did you find the example?

I've only changed the first line so far to:
B4X:
public static class SaveImagesInPdf extends PDFStreamEngine
{

And initialized it as a static class:

B4X:
    jo.InitializeStatic("b4j.example.main.SaveImagesInPdf")

but, as I say, there will be quite a bit to change to have a hope of getting it to work.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

B4X:
Sub AppStart (Args() As String)
 
    Dim Source As String = "D:\UE.pdf"
    Dim Destination As String = "D:\"

    Dim F As JavaObject
    F.InitializeNewInstance("java.io.File",Array(Source))

    Dim Document As JavaObject
    Document.InitializeStatic("org.apache.pdfbox.pdmodel.PDDocument")
 
    Dim Doc As JavaObject = Document.RunMethod("load",Array(F))
    Dim PageTree As JavaObject = Doc.RunMethodJO("getDocumentCatalog",Null).RunMethod("getPages",Null)
 
    Dim TotalImages As Int = 1
 
    Dim Iterator As JavaObject = PageTree.RunMethod("iterator",Null)
 
    Do While Iterator.RunMethod("hasNext",Null)
        Dim Page As JavaObject = Iterator.RunMethod("next",Null)
        Dim Resources As JavaObject = Page.RunMethod("getResources",Null)
        Dim Iterable As JavaObject = Resources.RunMethod("getXObjectNames",Null)
        Dim ResIterator As JavaObject = Iterable.RunMethod("iterator",Null)

        Do While ResIterator.RunMethod("hasNext",Null)
            Dim Name As Object = ResIterator.RunMethod("next",Null)

            If Resources.RunMethod("isImageXObject",Array(Name)) Then
                Dim OutputPath As String = File.Combine(Destination,$"Image${TotalImages}.png"$)

                Dim PDXObject As JavaObject = Resources.RunMethod("getXObject",Array(Name))
                Dim BufferedImage As JavaObject = PDXObject.RunMethod("getImage",Null)
                Dim FOS As JavaObject
                FOS.InitializeNewInstance("java.io.FileOutputStream",Array(OutputPath))

                Dim ImageIO As JavaObject
                ImageIO.InitializeStatic("javax.imageio.ImageIO")
                ImageIO.RunMethod("write",Array(BufferedImage,"png",FOS))
            
                TotalImages = TotalImages + 1
            End If
        Loop
    
    Loop

    Doc.RunMethod("close",Null)

End Sub

I used pdfbox-app-2.0.27 as that's what I had downloaded.

Image1.png



Image2.png
 
Last edited:
Upvote 1
Solution

T201016

Active Member
Licensed User
Longtime User
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

Hello and thank you very much for taking the time @stevel05
Regarding the place, I found this code on the page: PDFBox Tutorial

Soon I will try to implement your corrected code. There is a lot of it in this reason to reach for javaobject ..., I admit that java coding sometimes does not lie to me :(
I see that I used the PDFBOX-APP-2.0.27 version, I will try in my project at 3.0.4-apparently so far without gaps in the code.
I wish you a pleasant day.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I used 2.0.27 because you had 2.0.26 in your app. Some things have changed in version 3 and this code will not work as is.
 
Upvote 0

T201016

Active Member
Licensed User
Longtime User
I used 2.0.27 because you had 2.0.26 in your app. Some things have changed in version 3 and this code will not work as is.
I just wanted to mention that a lot of things are changed in these versions. It is not known sometimes which to use.
I will gladly use the proposed V3 version.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I just wanted to mention that a lot of things have been changed in these versions. It's sometimes hard to know which one to use.
Yes, V3 is quite a bit different, but generally use the one that works, unless it was upgraded because of security issues.
 
Upvote 0

T201016

Active Member
Licensed User
Longtime User
Yes, V3 is quite a bit different, but generally use the one that works, unless it was upgraded because of security issues.
Mainly that's why I changed the version to 3 because it seems to be updated in terms of security problems. Somewhere I read an article on this subject, if I find the text, I will also post a link for curiosity.
 
Upvote 0
Top