Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14768

EC : Busy DN replica should be consider in live replica check.

    XMLWordPrintableJSON

Details

    Description

      Policy is RS-6-3-1024K, version is hadoop 3.0.2;

      We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission index[3,4], increase the index 6 datanode's

      pendingReplicationWithoutTargets  that make it large than replicationStreamsHardLimit(we set 14). Then, After the method chooseSourceDatanodes of BlockMananger, the liveBlockIndices is [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 

      In method scheduleReconstruction of BlockManager, the additionalReplRequired is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a erasureCode task to target datanode.

      When datanode get the task will build  targetIndices from liveBlockIndices and target length. the code is blow.

      // code placeholder
      targetIndices = new short[targets.length];
      private void initTargetIndices() { 
        BitSet bitset = reconstructor.getLiveBitSet();
        int m = 0; hasValidTargets = false; 
        for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
          if (!bitset.get) {    
            if (reconstructor.getBlockLen > 0) {
             if (m < targets.length) {
               targetIndices[m++] = (short)i;
               hasValidTargets = true;
              }
            }
          }
       }
      

      targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.

      The StripedReader is  aways create reader from first 6 index block, and is [0,1,2,3,4,5]

      Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal bug. the block index6's data is corruption(all data is zero).

      I write a unit test can stabilize repreduce.

      // code placeholder
      private int replicationStreamsHardLimit = DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
      
      numDNs = dataBlocks + parityBlocks + 10;
      
      @Test(timeout = 240000)
      public void testFileDecommission() throws Exception {
        LOG.info("Starting test testFileDecommission");
        final Path ecFile = new Path(ecDir, "testFileDecommission");
        int writeBytes = cellSize * dataBlocks;
        writeStripedFile(dfs, ecFile, writeBytes);
        Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
        FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
      
        final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
            .getINode4Write(ecFile.toString()).asFile();
      
        LocatedBlocks locatedBlocks =
            StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
      
        LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
            .get(0);
      
        DatanodeInfo[] dnLocs = lb.getLocations();
      
        LocatedStripedBlock lastBlock =
            (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
      
        DatanodeInfo[] storageInfos = lastBlock.getLocations();
      
        //
        DatanodeDescriptor datanodeDescriptor = cluster.getNameNode().getNamesystem()
            .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
      
        BlockInfo firstBlock = fileNode.getBlocks()[0];
        DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
      
        // the first heartbeat will consume 3 replica tasks
        for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
          BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new Block(i),
              new DatanodeStorageInfo[]{dStorageInfos[0]});
        }
        assertEquals(dataBlocks + parityBlocks, dnLocs.length);
        int[] decommNodeIndex = {3, 4};
      
        final List<DatanodeInfo> decommisionNodes = new ArrayList<DatanodeInfo>();
        // add the node which will be decommissioning
        decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
        decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
        decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
      
        assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
        bm.getDatanodeManager().removeDatanode(datanodeDescriptor);
        //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
      
        // Ensure decommissioned datanode is not automatically shutdown
        DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
        assertEquals("All datanodes must be alive", numDNs,
            client.datanodeReport(DatanodeReportType.LIVE).length);
      
        FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
        Assert.assertTrue("Checksum mismatches!",
            fileChecksum1.equals(fileChecksum2));
      
        StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
            null, blockGroupSize);
      }
      
      

       

      Attachments

        1. HDFS-14768-branch-3.1.patch
          16 kB
          Yuanbo Liu
        2. HDFS-14768-branch-3.2.patch
          16 kB
          Yuanbo Liu
        3. HDFS-14768.011.patch
          16 kB
          guojh
        4. HDFS-14768.010.patch
          17 kB
          guojh
        5. HDFS-14768.009.patch
          17 kB
          guojh
        6. HDFS-14768.008.patch
          21 kB
          guojh
        7. HDFS-14768.007.patch
          16 kB
          guojh
        8. HDFS-14768.006.patch
          16 kB
          guojh
        9. HDFS-14768.005.patch
          16 kB
          guojh
        10. HDFS-14768.004.patch
          18 kB
          guojh
        11. HDFS-14768.003.patch
          18 kB
          guojh
        12. 1568771471942.jpg
          173 kB
          guojh
        13. HDFS-14768.002.patch
          17 kB
          guojh
        14. HDFS-14768.jpg
          67 kB
          Zhao Yi Ming
        15. HDFS-14768.001.patch
          17 kB
          guojh
        16. 1568276338275.jpg
          129 kB
          guojh
        17. 1568275810244.jpg
          88 kB
          guojh
        18. zhaoyiming_UT_after_deomission.txt
          4 kB
          Zhao Yi Ming
        19. zhaoyiming_UT_beofre_deomission.txt
          3 kB
          Zhao Yi Ming
        20. guojh_UT_after_deomission.txt
          4 kB
          Zhao Yi Ming
        21. guojh_UT_before_deomission.txt
          3 kB
          Zhao Yi Ming
        22. HDFS-14768.000.patch
          20 kB
          guojh

        Issue Links

          Activity

            People

              gjhkael guojh
              gjhkael guojh
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: