Details

    Description

      I've recently seen a couple of the standby tests fail. E.g. on Jenkins: https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1245/

      java.lang.AssertionError: expected: org.apache.jackrabbit.oak.segment.SegmentNodeState<{ checkpoints = { ... }, root = { ... } }> but was: org.apache.jackrabbit.oak.segment.SegmentNodeState<{ checkpoints = { ... }, root = { ... } }>
      	at org.apache.jackrabbit.oak.segment.standby.StandbyTestIT.testSyncLoop(StandbyTestIT.java:122)
      
      java.lang.AssertionError: expected: org.apache.jackrabbit.oak.segment.SegmentNodeState<{ checkpoints = { ... }, root = { ... } }> but was: org.apache.jackrabbit.oak.segment.SegmentNodeState<{ checkpoints = { ... }, root = { ... } }>
      	at org.apache.jackrabbit.oak.segment.standby.StandbyTestIT.testSyncLoop(StandbyTestIT.java:122)
      

      org.apache.jackrabbit.oak.segment.standby.ExternalSharedStoreIT.testProxySkippedBytes:

      java.lang.AssertionError: expected:<{ root = { ... } }> but was:<{ root : { } }>
      

      Attachments

        Issue Links

          Activity

            marett, could you have a look?

            mduerig Michael Dürig added a comment - marett , could you have a look?
            marett Timothee Maret added a comment - - edited

            mduerig the IT tests fails from time to time due to OAK-5034.
            Without OAK-5034 patch, IT may pass or fail depending on timing between the tar writer and the executions of assertions.

            marett Timothee Maret added a comment - - edited mduerig the IT tests fails from time to time due to OAK-5034 . Without OAK-5034 patch, IT may pass or fail depending on timing between the tar writer and the executions of assertions.
            marett Timothee Maret added a comment -

            mduerig OAK-5034 has been fixed. Could you close this issue (I don't have this ability) ?.

            marett Timothee Maret added a comment - mduerig OAK-5034 has been fixed. Could you close this issue (I don't have this ability) ?.

            Fixed as suggested by marett

            mduerig Michael Dürig added a comment - Fixed as suggested by marett
            chetanm Chetan Mehrotra added a comment - - edited

            marett I am reopening this issue as I saw a failure again

            StandbyTestIT.testSyncLoop failed again. See here

            Also saw failure in org.apache.jackrabbit.oak.plugins.segment.standby.BrokenNetworkTest.testProxySSLSkippedBytes. Given that module is not being worked upon more should we disable this test

            chetanm Chetan Mehrotra added a comment - - edited marett I am reopening this issue as I saw a failure again StandbyTestIT.testSyncLoop failed again. See here Also saw failure in org.apache.jackrabbit.oak.plugins.segment.standby.BrokenNetworkTest.testProxySSLSkippedBytes. Given that module is not being worked upon more should we disable this test
            marett Timothee Maret added a comment -

            chetanm thanks for reopening. I'll look at the StandbyTestIT.testSyncLoop issue.

            Regarding the o.a.j.o.plugins.segment.standby tests, I think it'd make sense to open a separate issue since this issue is about the oak-segment-tar module.

            marett Timothee Maret added a comment - chetanm thanks for reopening. I'll look at the StandbyTestIT.testSyncLoop issue. Regarding the o.a.j.o.plugins.segment.standby tests, I think it'd make sense to open a separate issue since this issue is about the oak-segment-tar module.
            frm Francesco Mari added a comment -

            The latest group of problematic tests related to Cold Standby are the following.

            FailoverIPRangeIT.testFailoverCorrectListIPv6UseIPv6:133->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }>
            FailoverIPRangeIT.testFailoverCorrectListUseIPv6:126->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }>
            FailoverIPRangeIT.testFailoverLocalClientUseIPv6:65->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }>
            

            These tests always fail because the following error.

            io.netty.channel.AbstractChannel$AnnotatedSocketException: Protocol family unavailable: /0:0:0:0:0:0:0:1:50238
            	at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_102]
            	at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_102]
            	at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_102]
            	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_102]
            	at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:242) ~[netty-transport-4.0.41.Final.jar:4.0.41.Final]
                    ...
            

            The test and the error hint that the issue might be related to the way IPv6 is implemented in the JVM or configured on the host. The following table reports the build, the node and the JVM where the tests failed.

            1320
            1319 beam2 (1.8) beam3 (1.8)
            1318 beam3 (1.8) beam2 (1.8)
            1317
            1316 beam2 (1.8) beam3 (1.8)
            1315 
            1314 beam1 (1.7)
            1313
            1312
            1311
            

            The following table reports the same information for every time the tests passed.

            1320 H18 (1.8) ubuntu-eu2 (1.8) H16 (1.8) H10 (1.8) H15 (1.7) H12 (1.7) ubuntu-1 (1.7) proserpina-test (1.7)
            1319 H18 (1.8) H16 (1.8) H15 (1.7) ubuntu-1 (1.7) H16 (1.7) proserpina-test (1.7)
            1318 H18 (1.8) H15 (1.8) ubuntu-1 (1.7) proserpina-test (1.7)
            1317 
            1316 H18 (1.8) H16 (1.8) H15 (1.7) ubuntu-1 (1.7) proserpina-test (1.7)
            1315 
            1314 ubuntu-6 (1.8) ubuntu-4 (1.8) ubuntu-2 (1.8) H10 (1.8) H12 (1.7) H17 (1.7) H11 (1.7)
            1313
            1312 
            1311
            

            The lack of overlapping between the node names hint that the problem might be related to the configuration of the nodes grouped under the "beam" label.

            frm Francesco Mari added a comment - The latest group of problematic tests related to Cold Standby are the following. FailoverIPRangeIT.testFailoverCorrectListIPv6UseIPv6:133->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }> FailoverIPRangeIT.testFailoverCorrectListUseIPv6:126->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }> FailoverIPRangeIT.testFailoverLocalClientUseIPv6:65->createTestWithConfig:164 expected:<{ root = { ... } }> but was:<{ root : { } }> These tests always fail because the following error. io.netty.channel.AbstractChannel$AnnotatedSocketException: Protocol family unavailable: /0:0:0:0:0:0:0:1:50238 at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_102] at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_102] at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_102] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_102] at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:242) ~[netty-transport-4.0.41.Final.jar:4.0.41.Final] ... The test and the error hint that the issue might be related to the way IPv6 is implemented in the JVM or configured on the host. The following table reports the build, the node and the JVM where the tests failed. 1320 1319 beam2 (1.8) beam3 (1.8) 1318 beam3 (1.8) beam2 (1.8) 1317 1316 beam2 (1.8) beam3 (1.8) 1315 1314 beam1 (1.7) 1313 1312 1311 The following table reports the same information for every time the tests passed. 1320 H18 (1.8) ubuntu-eu2 (1.8) H16 (1.8) H10 (1.8) H15 (1.7) H12 (1.7) ubuntu-1 (1.7) proserpina-test (1.7) 1319 H18 (1.8) H16 (1.8) H15 (1.7) ubuntu-1 (1.7) H16 (1.7) proserpina-test (1.7) 1318 H18 (1.8) H15 (1.8) ubuntu-1 (1.7) proserpina-test (1.7) 1317 1316 H18 (1.8) H16 (1.8) H15 (1.7) ubuntu-1 (1.7) proserpina-test (1.7) 1315 1314 ubuntu-6 (1.8) ubuntu-4 (1.8) ubuntu-2 (1.8) H10 (1.8) H12 (1.7) H17 (1.7) H11 (1.7) 1313 1312 1311 The lack of overlapping between the node names hint that the problem might be related to the configuration of the nodes grouped under the "beam" label.

            I think it is best to disable those tests on the Apache Jenkins instance via org.apache.jackrabbit.oak.commons.CIHelper.

            mduerig Michael Dürig added a comment - I think it is best to disable those tests on the Apache Jenkins instance via org.apache.jackrabbit.oak.commons.CIHelper .
            frm Francesco Mari added a comment -

            I tried to configure Jenkins to run the matrix tasks on the Ubuntu nodes only. For posterity, this can be done by adding a "Slaves" axis in the matrix configuration. The sad truth is that on Apache's Jenkins this doesn't work. I'm resorting to the CIHelper.

            frm Francesco Mari added a comment - I tried to configure Jenkins to run the matrix tasks on the Ubuntu nodes only. For posterity, this can be done by adding a "Slaves" axis in the matrix configuration. The sad truth is that on Apache's Jenkins this doesn't work. I'm resorting to the CIHelper .
            gmcdonald Gavin McDonald added a comment -

            I added an Axis to use the 'label' expression. The 'Slaves' Axis used to work but stopped working a short time ago, I filed a Jenkins ticket with no response so far.

            gmcdonald Gavin McDonald added a comment - I added an Axis to use the 'label' expression. The 'Slaves' Axis used to work but stopped working a short time ago, I filed a Jenkins ticket with no response so far.
            frm Francesco Mari added a comment -

            The latest failures in StandbyTestIT are due to a timeout (see build 1381). This situation is similar to what I already observed in OAK-5239. I will observe those tests now that the build configuration changed to add a more generous timeout.

            frm Francesco Mari added a comment - The latest failures in StandbyTestIT are due to a timeout (see build 1381). This situation is similar to what I already observed in OAK-5239 . I will observe those tests now that the build configuration changed to add a more generous timeout.

            frm, I haven't seen standby tests failing anymore since build #1382 (we are at #1394 now). I think we can resolve this. WDYT?

            mduerig Michael Dürig added a comment - frm , I haven't seen standby tests failing anymore since build #1382 (we are at #1394 now). I think we can resolve this. WDYT?

            Bulk close for 1.6.1

            edivad Davide Giannella added a comment - Bulk close for 1.6.1

            People

              frm Francesco Mari
              mduerig Michael Dürig
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: