Android Question jTidy Outputs Empty XML file

mangojack

Well-Known Member
Licensed User
Hi .. I cannot get jTidy lib to parse a downloaded HTML file to XML.
I have had success on a small web page generated by me and located on my host server and also a small html file in Assets folder..
but all other attemps result in empty XML file .

What am I doing wrong.

B4X:
Sub GetData 
    Okhc.Initialize("Okhc")
    req.InitializeGet("https://www.b4x.com/android/forum/")
    Okhc.Execute(req, 1) 
End Sub

Sub Okhc_ResponseSuccess (Response As OkHttpResponse, TaskId As Int)
        Response.GetAsynchronously("GetHTML", File.OpenOutput(File.DirDefaultExternal, "page.html", False), True, TaskId)     
End Sub

Sub GetHTML_StreamFinish (Success As Boolean, TaskId As Int)      
    tid.Initialize
    tid.Parse(File.OpenInput(File.DirDefaultExternal, "page.html"), File.DirDefaultExternal, "data.xml")
    sax.Initialize
    sax.Parse(File.OpenInput(File.DirDefaultExternal, "data.xml"), "sax")
End Sub

The sax.Parse line errors ...org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 0: no element found

Many thanks and Regards
 

mangojack

Well-Known Member
Licensed User
I was using OkHttpUtils2 in Main project .. but was working on an addition in a test project with OKhttp

Changing over I still am having no success .. The pages are definitely downloading , but jTidy will not parse to an XML file.

B4X:
Sub GetData
 
    Dim getHTML As HttpJob
    getHTML.Initialize("", Me)     
    getHTML.Download("https://www.b4x.com/android/forum/")
    Wait For (getHTML) JobDone(getHTML As HttpJob)
    If getHTML.Success Then
     
        Log(getHTML.GetString) 
     
        Dim out As OutputStream = File.OpenOutput(File.DirDefaultExternal, "page.html", False)
        File.Copy2(getHTML.GetInputStream, out)
        out.Close  
     End If

    tid.Initialize
    tid.Parse(File.OpenInput(File.DirDefaultExternal, "page.html"), File.DirDefaultExternal, "data.xml")   'page.html all good
    sax.Initialize
    sax.Parse(File.OpenInput(File.DirDefaultExternal, "data.xml"), "sax")   'data.xml is empty

End Sub


Output of Log(Job.GetString)...

<!DOCTYPE html>
<html id="XenForo" lang="en-US" dir="LTR" class="Public NoJs LoggedOut Sidebar Responsive" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />

<meta name="viewport" content="width=device-width, initial-scale=1">


<base href="https://www.b4x.com/android/forum/" />
<script>
var _b = document.getElementsByTagName('base')[0], _bH = "https://www.b4x.com/android/forum/";
if (_b && _b.href != _bH) _b.href = _bH;
</script>

<title>B4X Community - Android, iOS, desktop, server and IoT programming tools</title>

<noscript><style>.JsOnly, .jsOnly { display: none !important; }</style></noscript>
<link rel="stylesheet" href="css.php?css=xenforo,form,public&amp;style=1&amp;dir=LTR&amp;d=1500448514" />
<link rel="stylesheet" href="css.php?css=login_bar,node_category,node_forum,node_list,toggleme_auto,toggleme_manual&amp;style=1&amp;dir=LTR&amp;d=1500448514" />


<script>
var _gaq = [['_setAccount', 'UA-1987329-1'], ['_trackPageview']];
!function(d, t)
{
var g = d.createElement(t),
s = d.getElementsByTagName(t)[0];
g.async = true;
g.src = ('https:' == d.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
s.parentNode.insertBefore(g, s);
}
(document, 'script');
</script>
<script src="https://b4x-4c17.kxcdn.com/android/forum/js/jquery/jquery-1.11.0.min.js"></script>

<script src="https://b4x-4c17.kxcdn.com/android/forum/js/xenforo/xenforo.js?_v=6763c268"></script>
<script src="https://b4x-4c17.kxcdn.com/android/forum/js/sedo/toggleme/toggleME.js?_v=6763c268"></script>

<link rel="apple-touch-icon" href="https://www.b4x.com/android/forum/styles/default/xenforo/logo.og.png" />
<link rel="alternate" type="application/rss+xml" title="RSS feed for B4X Community - Android, iOS, desktop, server and IoT programming tools" href="forums/-/index.rss" />

<link rel="canonical" href="https://www.b4x.com/android/forum/" />
<meta name="description" content="Rapid Application Development tools for native Android, iOS and desktop applications. Programming language similar to Visual Basic." />

<link href='//fonts.googleapis.com/css?family=Noto+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="https://b4x-4c17.kxcdn.com/xf_forum.css" />
<script src="https://b4x-4c17.kxcdn.com/js/jquery-ui.js"></script>
<link rel="stylesheet" href="https://b4x-4c17.kxcdn.com/js/jquery-ui.css">
<link href="/opensearch.xml" rel="search" title="B4X Search Engine" type="application/opensearchdescription+xml">
<script src="https://b4x-4c17.kxcdn.com/js/headers.js" type="text/javascript"></script>


</head>
<body>
<!-- facebook-->
<div id="fb-root"></div>
<script>(function(d, s, id) {
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); js.id = id;
js.src = "//connect.facebook.net/en_US/sdk.js#xfbml=1&version=v2.7&appId=269766340041798";
fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));</script>
<!-- ****** facebook ******* -->
<!-- **************** -->
<input type="hidden" id="name" value="forum" />
<!-- headers -->
<div id="headers" class="pageWidth" style="position:relative;left:0px;">
<img id="bg" src="https://b4x-4c17.kxcdn.com/images/Header-bg.png"/>
<a href="//www.b4x.com/android/forum/"><img id="anywheresoftware_logo" src="https://b4x-4c17.kxcdn.com/images/Logo_on-dark.png"/></a>
</div>
<div id="menu" class="pageWidth">
<span id="b4x_menu_items">
<div id="home"><a href="/">HOME</a></div>
<div id="b4a"><a href="/b4a.html">B4A</a></div>
<div id="b4i"><a href="/b4i.html">B4i</a></div>
<div id="b4j"><a href="/b4j.html">B4J</a></div>
<div id="b4r"><a href="/b4r.html">B4R</a></div>
<div id="store"><a href="/store.html">STORE</a></div>
<div id="showcase"><a href="/showcase.html">SHOWCASE</a></div>
<div id="forum"><a href="/android/forum/">COMMUNITY</a></div>
</span>

Message longer than Log limit (4000). Message was truncated.[/qote]
 
Last edited:
Upvote 0

mangojack

Well-Known Member
Licensed User
This works on a small test webpage ..
B4X:
    Dim getHTML As HttpJob
    getHTML.Initialize("", Me)      
    getHTML.Download("http:\\icyg.net")
    Wait For (getHTML) JobDone(getHTML As HttpJob)
    If getHTML.Success Then
        tid.Initialize
        tid.Parse(getHTML.GetInputStream,File.DirDefaultExternal, "data.xml")
        sax.Initialize
        sax.Parse(File.OpenInput(File.DirDefaultExternal, "data.xml"), "sax")  
    End If


EDIT .... After a lot of testing ,getHTML.GetInputStream appears OK and changes in size depending on URI .. but jTidy still refuses to parse it to file.xml.
 
Last edited:
Upvote 0

Erel

Administrator
Staff member
Licensed User
I've tested it with this code:
B4X:
Sub Activity_Click
   Dim getHTML As HttpJob
   getHTML.Initialize("", Me)
   getHTML.Download("http:\\icyg.net")
   Wait For (getHTML) JobDone(getHTML As HttpJob)
   If getHTML.Success Then
     Dim tid As Tidy
     tid.Initialize
     tid.Parse(getHTML.GetInputStream,File.DirInternal, "data.xml")
     Log(File.ReadString(File.DirInternal, "data.xml"))
   End If
End Sub

It works properly. Are you using jTidy v1.10?
 
Upvote 0

mangojack

Well-Known Member
Licensed User
@Erel .. Yes jTidy 1.10. I have success with that url and a few others ... but Not this for example (and many others..)

B4X:
   Dim getHTML As HttpJob
   getHTML.Initialize("", Me)
    getHTML.Download("https://www.b4x.com/android/forum/")
   Wait For (getHTML) JobDone(getHTML As HttpJob)
   If getHTML.Success Then
     Dim tid As Tidy
     tid.Initialize
     tid.Parse(getHTML.GetInputStream,File.DirInternal, "data.xml")
     Log(File.ReadString(File.DirInternal, "data.xml"))
   End If
 
Upvote 0

mangojack

Well-Known Member
Licensed User
Thanks ... That works .

I read references online to .. setForceOutput.
As a bonus I have been given a glimpse of understanding/ using Java object.:)
 
Upvote 0
Top