Description
There's a potential problem with storing the guideposts as a VARBINARY ARRAY, as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting stats for a table with a trailing VARBINARY row key column if the value contained embedded null bytes. Because of this, we're better off storing guideposts as VARBINARY and serializing/deserializing in the following manner:
<byte length as vint><bytes><byte length as vint><bytes>...
We should also store as a separate KeyValue column the total number of guideposts. So the schema of SYSTEM.STATS would look like this now instead:
public static final String CREATE_STATS_TABLE_METADATA = "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" + SYSTEM_STATS_TABLE + "\"(\n" + // PK columns PHYSICAL_NAME + " VARCHAR NOT NULL," + COLUMN_FAMILY + " VARCHAR," + REGION_NAME + " VARCHAR," + GUIDE_POSTS + " VARBINARY," + GUIDE_POSTS_COUNT + " SMALLINT," + MIN_KEY + " VARBINARY," + MAX_KEY + " VARBINARY," + LAST_STATS_UPDATE_TIME+ " DATE, "+ "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY (" + PHYSICAL_NAME + "," + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" + // TODO: should we support versioned stats? // Install split policy to prevent a physical table's stats from being split across regions. HTableDescriptor.SPLIT_POLICY + "='" + MetaDataSplitPolicy.class.getName() + "'\n";
Then the serialization code in StatisticsTable.addStats() would need to change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the new format.
The deserialization code is isolated to StatisticsUtil.readStatisitics(). It would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then deserialize the GUIDE_POSTS in the new format.