Uploaded image for project: 'XMLBeans'
  1. XMLBeans
  2. XMLBEANS-135

bad handling of embeded CDATA

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Version 1.0.3, Version 1.0.4, Version 2 Beta 1
    • Version 2 Beta 2, Version 2
    • None
    • None
    • I arrived to it on windows with jdk 1.4.2.

    Description

      I have a case of bad xml. It is an envelope document that includes another
      document. The parser expect the enclosed document to be in CDATA. The problem
      is that the second document now include a third document which is also
      expected to be a CDATA.

      I create document A with an XMLBean. I put it has a text element of document B
      after I transformed Document A to a string with xmlText(). I then do the same
      with document B by putting it in Document C. Everything works well and
      automatically and it creates CDATA everytime it needs to.

      //fragment
      XmlOptions options = new XmlOptions();
      options.setSavePrettyPrint();
      Field field = getAssessmentFields().addNewField();
      field.setFieldName("AssessmentContent");
      field.setFieldValue(answersDocument.xmlText(options));
      ..

      The problem is that on the second escaping the CDATA end ([[>)is escaped to
      ">". The SAX parser that read all this (Xalan) just can't do it. Also, the
      specification says that there should not be any CDATA containing a CDATA.

      Here is the modification I made for embeded CDATA. Do you think that would be
      worty of beeing included?

      here is the entitizeContent method in Saver.java:

      Pattern cdataPattern = Pattern.compile("CDATA");

      private void entitizeContent ( )
      {
      if (_lastEmitCch == 0)
      return;

      int i = _lastEmitIn;
      final int n = _buf.length;

      boolean hasOutOfRange = false;

      int count = 0;
      for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )

      { char ch = _buf[ i ]; if (ch == '<' || ch == '&') count++; else if (isBadChar( ch )) hasOutOfRange = true; if (++i == n) i = 0; }

      if (count == 0 && !hasOutOfRange)
      return;

      i = _lastEmitIn;

      //
      // Heuristic for knowing when to save out stuff as a CDATA.
      //

      // Well check if we have a cdata in the buffer.
      // If we do, we won't nest another one.
      CharBuffer charBuffer = CharBuffer.wrap(_buf);
      boolean hasCDATA = cdataPattern.matcher(charBuffer).find();

      if (_lastEmitCch > 32 && count > 5 &&
      count * 100 / _lastEmitCch > 1 && !hasCDATA)
      {
      boolean lastWasBracket = _buf[ i ] == ']';

      i = replace( i, "<![CDATA[" + _buf[ i ] );

      boolean secondToLastWasBracket = lastWasBracket;

      lastWasBracket = _buf[ i ] == ']';

      if (++i == _buf.length)
      i = 0;

      for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )

      { char ch = _buf[ i ]; if (ch == '>' && secondToLastWasBracket && lastWasBracket) i = replace( i, ">" ); else if (isBadChar( ch )) i = replace( i, "?" ); else i++; secondToLastWasBracket = lastWasBracket; lastWasBracket = ch == ']'; if (i == _buf.length) i = 0; }

      emit( "]]>" );
      }
      else
      {
      for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )

      { char ch = _buf[ i ]; if (ch == '<') i = replace( i, "<" ); else if (hasCDATA && ch == '>') i = replace(i, ">"); else if (ch == '&') i = replace( i, "&" ); else if (isBadChar( ch )) i = replace( i, "?" ); else i++; if (i == _buf.length) i = 0; }

      }
      }

      Attachments

        Activity

          People

            Unassigned Unassigned
            martinhamel Martin Hamel
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: