Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7909

When DN is offline Read of EC data is failing [Failed to execute command GetBlock on the Pipeline]

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM
    • None

    Description

      When DN is offline Read of EC data is failing

      Getting the below error message:

      GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to read the EC block 

      Stack Trace:

      2023-02-03 14:05:31,610|INFO|MainThread|machine.py:188 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|RUNNING: /opt/cloudera/parcels/CDH/bin/ozone sh key get o3://ozone1/vol-x20w7/enc-buck-3yp31/decom_1675432802 /tmp/Get_file1675433131 2023-02-03 14:05:35,968|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:35 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties 2023-02-03 14:05:36,040|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2023-02-03 14:05:36,041|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics system started 2023-02-03 14:05:36,937|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 4b386868-0719-4e2f-bd3b-bda45c921f97, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:36.904Z[Etc/UTC]]. 2023-02-03 14:05:36,938|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=4b386868-0719-4e2f-bd3b-bda45c921f97: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:36,980|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 34a3c677-ed98-428f-a0d9-a19f73f93116, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:36.970Z[Etc/UTC]]. 2023-02-03 14:05:36,981|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:36 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=34a3c677-ed98-428f-a0d9-a19f73f93116: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,014|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: aee4853d-cc99-43c8-a682-2dc4ad322242, Nodes: 0a7dfbbc-9bd4-482a-81ed-9b213ab2bf63(quasar-tgmmij-1.quasar-tgmmij.root.hwx.site/172.27.204.197), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.003Z[Etc/UTC]]. 2023-02-03 14:05:37,016|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=aee4853d-cc99-43c8-a682-2dc4ad322242: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,039|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,034 [main] WARN io.ECBlockInputStreamProxy (ECBlockInputStreamProxy.java:read(180)) - Failing over to reconstruction read due to an error in ECBlockReader. Exception Class: org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception Message: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,040|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockInputStreamProxy: Failing over to reconstruction read due to an error in ECBlockReader. Exception Class: org.apache.hadoop.ozone.client.io.BadDataLocationException , Exception Message: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,057|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory) 2023-02-03 14:05:37,058|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable 2023-02-03 14:05:37,185|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: b3a482a5-33c9-40dd-8614-bfc136ec4479, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.163Z[Etc/UTC]]. 2023-02-03 14:05:37,187|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=b3a482a5-33c9-40dd-8614-bfc136ec4479: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,229|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: d18620b2-70cb-4f07-95b7-45d69980f100, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.220Z[Etc/UTC]]. 2023-02-03 14:05:37,230|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=d18620b2-70cb-4f07-95b7-45d69980f100: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,260|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 08dd8828-a81e-44ac-8757-aa1b66df2c72, Nodes: 8fea5559-5799-4c17-8d34-17aa6672b87a(quasar-tgmmij-2.quasar-tgmmij.root.hwx.site/172.27.183.130), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.250Z[Etc/UTC]]. 2023-02-03 14:05:37,261|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=08dd8828-a81e-44ac-8757-aa1b66df2c72: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,282|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,279 [main] WARN io.ECBlockReconstructedStripeInputStream (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,284|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 5. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,331|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 59920096-eac8-40bd-86c6-4a2fb44edfc7, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.290Z[Etc/UTC]]. 2023-02-03 14:05:37,333|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=59920096-eac8-40bd-86c6-4a2fb44edfc7: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,362|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: d8b18f5b-1fbe-4493-b370-08e22eb0e64d, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.351Z[Etc/UTC]]. 2023-02-03 14:05:37,364|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=d8b18f5b-1fbe-4493-b370-08e22eb0e64d: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,390|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 ERROR scm.XceiverClientGrpc: Failed to execute command GetBlock on the pipeline Pipeline[ Id: 78e3a9ff-df9d-4cbf-a584-b73254e06ce8, Nodes: 4e84413f-bf98-4159-914d-5d4eaae5070d(quasar-tgmmij-3.quasar-tgmmij.root.hwx.site/172.27.202.202), ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, CreationTimestamp2023-02-03T14:05:37.380Z[Etc/UTC]]. 2023-02-03 14:05:37,392|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 INFO storage.BlockInputStream: Unable to read information for block conID: 5007 locID: 111677748019205007 bcsId: 0 from pipeline PipelineID=78e3a9ff-df9d-4cbf-a584-b73254e06ce8: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,411|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|2023-02-03 14:05:37,409 [main] WARN io.ECBlockReconstructedStripeInputStream (ECBlockReconstructedStripeInputStream.java:loadDataBuffersFromStream(590)) - Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,413|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|23/02/03 14:05:37 WARN io.ECBlockReconstructedStripeInputStream: Failed to read from block conID: 5007 locID: 111677748019205007 bcsId: 0 EC index 4. Excluding the block Exception: java.io.IOException Exception Message: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2023-02-03 14:05:37,442|INFO|MainThread|machine.py:203 - run()||GUID=76425c29-acbc-4b0b-9b39-623c5879f0b7|There are insufficient datanodes to read the EC block  

      Additional Debugging RCA was done and found out that there were sufficient number of DN's available at the time of key get operations. Below are the details :
       
      EC Dn's are supposed to be 7 and are 7 in numbers
      RATIS has to be 3 and those are 3 
      EC Data node -
      Datanodes':[u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', u'hostname-2.hostname.root.hwx.site', u'hostname-6.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', u'hostname-5.hostname.root.hwx.site', u'hostname-8.hostname.root.hwx.site'],
      Ratis DN available at this point 5
      [u'hostname-2.hostname.root.hwx.site', u'hostname-3.hostname.root.hwx.site', u'hostname-1.hostname.root.hwx.site', u'hostname-7.hostname.root.hwx.site', u'hostname-6.hostname.root.hwx.site']

      Adding the log files 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asarin Arun Sarin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: