Details
Description
In class TableSnapshotInputFormat or TableSnapshotInputFormatImpl
in the function
public static void setInput(Job job, String snapshotName, Path restoreDir) throws IOException {
we are setting restoreDir (temporary root) to tableDir
conf.set(TABLE_DIR_KEY, restoreDir.toString());
The above parameter is used to get the InputSplits, especially for
calculating favorable hosts in the function
Path tableDir = new Path(conf.get(TABLE_DIR_KEY)); List<String> hosts = getBestLocations(conf, HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir));
This will lead to returning a empty HDFSBlocksDistribution, as there is
will be no directory with name as the region name from hri in the restored
root directory, which will lead to scheduling of non local tasks.
The change is simple in the sense, is to call the
FSUtils.getTableDir(rootDir, tableDesc.getTableName())
in the getSplits function
more discussion in the comments below