B4J Question [Closed] XML Canonicalization (c14n)

aeric

Expert
Licensed User
Longtime User
I have been struggling for days for this little guy, xml-c14n11.
For those new, it is just "cleaning" the XML file with some rules by removing:
- Whitespaces
- Line breaks
- Comments

It sounds easy but I am thinking if the correct way to do it is using a library.

So I tried with the latest xmlsec-4.0.2.jar library with JavaObject.
I have wasted a lot of time making the canonicalize method work but failed.
I am getting error:
B4X:
Waiting for debugger to connect...
Program started.
Canonicalize Method: http://www.w3.org/2006/12/xml-c14n11
Error occurred on line: 62 (Main)
java.lang.RuntimeException: Method: canonicalize not matched.
    at anywheresoftware.b4j.object.JavaObject.RunMethod(JavaObject.java:130)
    at b4j.example.main._canonicalize(main.java:157)
    at b4j.example.main._appstart(main.java:64)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at anywheresoftware.b4a.shell.Shell.runMethod(Shell.java:629)
    at anywheresoftware.b4a.shell.Shell.raiseEventImpl(Shell.java:234)
    at anywheresoftware.b4a.shell.Shell.raiseEvent(Shell.java:167)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at anywheresoftware.b4a.BA.raiseEvent2(BA.java:111)
    at anywheresoftware.b4a.shell.ShellBA.raiseEvent2(ShellBA.java:100)
    at anywheresoftware.b4a.BA.raiseEvent(BA.java:98)
    at b4j.example.main.main(main.java:29)
Program terminated (StartMessageLoop was not called).
I guess I am passing the wrong type for InputBytes. (edit: Turn out the issue is OutputStream)

canonicalize​

public void canonicalize(byte[] inputBytes, OutputStream writer, boolean secureValidation)
throws org.apache.xml.security.parser.XMLParserException, IOException, CanonicalizationException

This method tries to canonicalize the given bytes. It's possible to even canonicalize non-wellformed sequences if they are well-formed after being wrapped with a >a<...>/a<.

Parameters:
inputBytes -
writer - OutputStream to write the canonicalization result
secureValidation - Whether secure validation is enabled

Then I tried to find if there is any version that has different method signature (except only 1 parameter instead of 3) as available in many old code examples that I can found.
Java:
Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(yourXmlBytes);
String canonXmlString = new String(canonXmlBytes);
Happy that I found the last version is xmlsec 2.1.8 that works with passing a byte array.

Any version beyond this version, such as the latest version 4.0.2 requires 3 parameters.

Questions:
1. Should I stick to the version that works but it is reported the older version has vulnerabilities?

2. If I want to make use of the newer library, how can I fix it?

3. After canonicalized, the following string has changed? I worry this will affect the digest result.
Before:
XML:
<cbc:AdditionalAccountID schemeAgencyName="CertEX"/>
After:
XML:
<cbc:AdditionalAccountID schemeAgencyName="CertEX"></cbc:AdditionalAccountID>

Edit: I reattached project with solution (removed sample input file containing sensitive data).
Select Build Configurations: Default/Legacy

Additional jars:
 

Attachments

  • canonicalize.zip
    1.5 KB · Views: 16
Last edited:

aeric

Expert
Licensed User
Longtime User
3. After canonicalized, the following string has changed? I worry this will affect the digest result.
Before:
XML:
<cbc:AdditionalAccountID schemeAgencyName="CertEX"/>
After:
XML:
<cbc:AdditionalAccountID schemeAgencyName="CertEX"></cbc:AdditionalAccountID>
I have another function to "Linearize" the XML which doesn't adding closing tag for empty element.
Maybe I don't need "Canonicalize".
B4X:
Public Sub LinearizeXML (Text As String) As String
    Return Regex.Replace("\s+", Text, " ").Trim
End Sub
but this function doesn't remove comments.
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
I think just use the following function to "clean up" or "canonicalize" the XML is better.

B4X:
' Remove comments, line breaks, tabs and spaces
Public Sub CanonicalizeXML (Text As String) As String
    Text = Regex.Replace("<!--[\s\S]*?-->", Text, "")
    Return Regex.Replace("\s+", Text, " ").Trim
End Sub
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
After reading the documentation, this is the expected results.

3.3 Start and End Tags​

Input Document<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
<doc>
<e1 />
<e2 ></e2>
<e3 name = "elem3" id="elem3" />
<e4 name="elem4" id="elem4" ></e4>
<e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
xmlns:b="http://www.ietf.org"
xmlns:a="http://www.w3.org"
xmlns="http://example.org"/>
<e6 xmlns="" xmlns:a="http://www.w3.org">
<e7 xmlns="http://www.ietf.org">
<e8 xmlns="" xmlns:a="http://www.w3.org">
<e9 xmlns="" xmlns:a="http://www.ietf.org"/>
</e8>
</e7>
</e6>
</doc>
Canonical Form<doc>
<e1></e1>
<e2></e2>
<e3 id="elem3" name="elem3"></e3>
<e4 id="elem4" name="elem4"></e4>
<e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5>
<e6 xmlns:a="http://www.w3.org">
<e7 xmlns="http://www.ietf.org">
<e8 xmlns="">
<e9 xmlns:a="http://www.ietf.org" attr="default"></e9>
</e8>
</e7>
</e6>
</doc>
 
Upvote 0

aeric

Expert
Licensed User
Longtime User
2. If I want to make use of the newer library, how can I fix it?
From this post, I learned how to initialize the OutputStream.

Here is my code.
B4X:
Public Sub Canonicalize (XML As String) As Byte()
    Dim writer As OutputStream
    writer.InitializeToBytesArray(0)
    
    Dim Init As JavaObject
    Init.InitializeStatic("org.apache.xml.security.Init")
    Init.RunMethod("init", Null)
    
    Dim Canonicalizer As JavaObject
    Canonicalizer.InitializeStatic("org.apache.xml.security.c14n.Canonicalizer")
    Dim ALGO_ID_C14N11_OMIT_COMMENTS As String = Canonicalizer.GetField("ALGO_ID_C14N11_OMIT_COMMENTS")
    'Log("Canonicalize Method: " & ALGO_ID_C14N11_OMIT_COMMENTS)
    Canonicalizer = Canonicalizer.RunMethod("getInstance", Array(ALGO_ID_C14N11_OMIT_COMMENTS))
    Canonicalizer.RunMethod("canonicalize", Array(XML.GetBytes("UTF8"), writer, False))
    Return writer.ToBytesArray
End Sub
 
Upvote 0
Top