B4J Question xml sax parse error with text having entities


Active Member
Licensed User
I am trying to parse this xml file with xml2map:

<?xml version="1.0" encoding="utf-8"?>
<tmx version="1.4">
  <header creationtool="SDL Language Platform" creationtoolversion="8.1" o-tmf="SDL TM8 Format" datatype="xml" segtype="sentence" adminlang="en-US" srclang="en-US" creationdate="20150508T030840Z" creationid="AS\liu_rosemary">
    <prop type="x-Quality:Integer"></prop>
    <prop type="x-SourceFile:SingleString"></prop>
    <prop type="x-TargetFile:SingleString"></prop>
    <prop type="x-Recognizers">RecognizeAll</prop>
    <prop type="x-IncludesContextContent">True</prop>
    <prop type="x-TMName">en-cn</prop>
    <prop type="x-TokenizerFlags">DefaultFlags</prop>
    <prop type="x-WordCountFlags">DefaultFlags</prop>
<tu creationdate="20150630T022922Z" creationid="AS\liu_rosemary" changedate="20150701T064834Z" changeid="AS\nie_grace" lastusagedate="20150701T064834Z">
      <prop type="x-LastUsedBy">AS\liu_rosemary</prop>
      <prop type="x-Context">8422768147655872960, -1760116759265925116</prop>
      <prop type="x-ContextContent">Customer shall pay recovery cost incurred by the Supplier, without prejudice to the minimum imposed by applicable law. |  | 在不影响适用法律强加的最低限额的前提下,客户应向供应方支付其所承受的恢复成本。 | </prop>
      <prop type="x-Origin">TM</prop>
      <prop type="x-ConfirmationLevel">Translated</prop>
      <tuv xml:lang="en-US">
        <seg>All payments due to the Supplier hereunder shall be made in full without set&#x1E;off, counterclaim, deduction or withholding of any kind.</seg>
      <tuv xml:lang="zh-CN">

I got this error message:

[Fatal Error] :21:96: 瀛楃寮曠敤 "&#
(SAXParseException) org.xml.sax.SAXParseException; lineNumber: 21; columnNumber: 96; 字符引用 "&#


Well-Known Member
Licensed User
It looks like an Esc character has been entered into the string. You could try replacing &#x1E; with nothing and see if the error goes.
Upvote 0


Active Member
Licensed User
There will be no error if I replace it. But what matters is what invalid text we should replace to avoid such a situation.
Upvote 0