Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-6162

Concurrent binary serialization produces corrupt output

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1SDK
    • 3.2.0SDK
    • UIMA
    • None

    Description

      I suspect there could be an issue in `BinaryCasSerDes`.

      When deserializing the attached file `admin.ser`, I get this stack trace:

      Caused by: java.lang.ClassCastException: class org.apache.uima.jcas.tcas.Annotation cannot be cast to class org.apache.uima.jcas.cas.Sofa (org.apache.uima.jcas.tcas.Annotation and org.apache.uima.jcas.cas.Sofa are in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader @4593ff34)at org.apache.uima.cas.impl.BinaryCasSerDes.makeSofaFromHeap(BinaryCasSerDes.java:1823) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.getSofaFromAnnotBase(BinaryCasSerDes.java:1817) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.createFSsFromHeaps(BinaryCasSerDes.java:1701) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:259) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:328) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.Serialization.deserializeCASComplete(Serialization.java:129) ~[uimaj-core-3.1.1.jar:3.1.1]

       The code used to read the file before deserializing is as follows:

          public static void readSerializedCas(CAS aCas, File aFile)
              throws IOException
          {
              try (ObjectInputStream is = new ObjectInputStream(new FileInputStream(aFile))) {
                  CASCompleteSerializer serializer = (CASCompleteSerializer) is.readObject();
                  deserializeCASComplete(serializer, (CASImpl) aCas);
              }
              catch (ClassNotFoundException e) {
                  throw new IOException(e);
              }
          }
      

      I set a breakpoint to BinaryCasSerDes:1608 which is a for loop iterating over the heap. Apparently, the first feature structure that is encountered is an annotation type which is NOT the SOFA. Then in line 1700, the deserializer tries to resolve the SOFA for this annotation but fails because it has not yet been deserialized. Eventually makeSofaFromHeap is called and checks if a SOFA needs to be created. It tries to look up the SOFAs ID (1) from csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) and generates a new SOFA. However, when the SECOND annotation is read and csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) is called again and tries to resolve the SOFA from addr 1, it gets the previously deserialized annotation instead of the SOFA annotation that had been created.

      The SOFA that has been implicitly created is added to the csds.addr2fs map at key 1... however, later in BinaryCasSerDes:1723, the key 1 is overwritten by the deserialized annotation:

              if (!isSofa) { // if it was a sofa, other code added or pended it
                csds.addFS(fs, heapIndex); // this overrides to SOFA that was created at key 1 because heapIndex is also 1
              }
      

      The heap looks something like this:

      [0, 187, 1, 33, 46, 199, 200, 201, 44, 202, 187, 1, 33, 46, 203, 204, 205, 45, 206, 187, 1, 33, 46, 207, 208, 209, 46, 210, 187, 1, 33, 46, 211, 212, 213, 47, 214, 187, 1, 33, 46, 215, 216, 217, 48, 1, 187, 1,...
      

      I guess that 187 is the type code of the first annotation and we can see it repeats a couple of times. The 1 seems to be the SOFA ID - the first feature of the feature structures. However, instead of 1 referring to the address of the SOFA, it points at the first annotation which is NOT a SOFA.

      Bug in the serialization code assuming that the SOFA is always in the first position?

      Attachments

        1. admin.ser
          122 kB
          Richard Eckart de Castilho

        Issue Links

          Activity

            People

              schor Marshall Schor
              rec Richard Eckart de Castilho
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m