B4J Question xml sax parse error with text having entities

xulihang

Active Member
Licensed User
Longtime User
I am trying to parse this xml file with xml2map:

B4X:
<?xml version="1.0" encoding="utf-8"?>
<tmx version="1.4">
  <header creationtool="SDL Language Platform" creationtoolversion="8.1" o-tmf="SDL TM8 Format" datatype="xml" segtype="sentence" adminlang="en-US" srclang="en-US" creationdate="20150508T030840Z" creationid="AS\liu_rosemary">
    <prop type="x-Quality:Integer"></prop>
    <prop type="x-SourceFile:SingleString"></prop>
    <prop type="x-TargetFile:SingleString"></prop>
    <prop type="x-Recognizers">RecognizeAll</prop>
    <prop type="x-IncludesContextContent">True</prop>
    <prop type="x-TMName">en-cn</prop>
    <prop type="x-TokenizerFlags">DefaultFlags</prop>
    <prop type="x-WordCountFlags">DefaultFlags</prop>
  </header>
  <body>
<tu creationdate="20150630T022922Z" creationid="AS\liu_rosemary" changedate="20150701T064834Z" changeid="AS\nie_grace" lastusagedate="20150701T064834Z">
      <prop type="x-LastUsedBy">AS\liu_rosemary</prop>
      <prop type="x-Context">8422768147655872960, -1760116759265925116</prop>
      <prop type="x-ContextContent">Customer shall pay recovery cost incurred by the Supplier, without prejudice to the minimum imposed by applicable law. |  | 在不影响适用法律强加的最低限额的前提下,客户应向供应方支付其所承受的恢复成本。 | </prop>
      <prop type="x-Origin">TM</prop>
      <prop type="x-ConfirmationLevel">Translated</prop>
      <tuv xml:lang="en-US">
        <seg>All payments due to the Supplier hereunder shall be made in full without set&#x1E;off, counterclaim, deduction or withholding of any kind.</seg>
      </tuv>
      <tuv xml:lang="zh-CN">
        <seg>任何应付款均应向供应方全额支付,不得以任何方式抵消、反索赔、扣除或扣缴。</seg>
      </tuv>
    </tu>
      </body>
      </tmx>

I got this error message:

B4X:
[Fatal Error] :21:96: 瀛楃寮曠敤 "&#
(SAXParseException) org.xml.sax.SAXParseException; lineNumber: 21; columnNumber: 96; 字符引用 "&#
 

Daestrum

Expert
Licensed User
Longtime User
It looks like an Esc character has been entered into the string. You could try replacing &#x1E; with nothing and see if the error goes.
 
Upvote 0

xulihang

Active Member
Licensed User
Longtime User
There will be no error if I replace it. But what matters is what invalid text we should replace to avoid such a situation.
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
You could try changing the
B4X:
<?xml version="1.0" encoding="utf-8"?>

to

B4X:
<?xml version="1.1" encoding="utf-8"?>

Version 1.1 allows for more characters which aren't illegal.
 
Upvote 0
Top