Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14030

HBase Backup/Restore Phase 1

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • HBASE-7912
    • None
    • None
    • Reviewed
    • Hide
      This experimental feature allows to perform backup/restore operations, including incremental ones, on a set of HBase tables.

      Key features and Use Cases

      A common practice of backup and restore in database is to first take full baseline backup, and
      then periodically take incremental backup that capture the changes since the full baseline
      backup. HBase cluster can store massive amount data. Therefore we want use full backup in
      combination with incremental backups for HBase as well.
      The following is a typical use case scenario for full and incremental backup:

      ● The user takes a full backup of a table or a set of tables in HBase.
      ● The user schedules periodical incremental backups to capture the changes from the full
      backup, or from last incremental backup.
      ● The user needs to restore table data to a past point in time.
      ● The full backup is restored to the table(s) or to different table name(s). Then the
      incremental backups that are up to the desired point in time are applied on top of the full
      backup.
      We would support the following key features and capabilities.
      ● Backup to DFS FileSystem across clusters and possibly to other storage media or
      servers.
      ● Support single table or a set of tables backup and restore (full and incremental).
      ● Restore to different table names and to different clusters.
      ● Support adding and removing tables to and from backup set without interruption of
      incremental backup schedule.
      ● Support merge of incremental backups into longer period and bigger incremental
      backups for easy storage and restore.
      ● Support scheduled backups.
      ● Unified command line interface for all the above.

      To illustrate these key capabilities, the following are two more detailed use case examples.

      Use case example 1:

      1. User takes a full backup of a set of tables (i.e. table1 and table2) in HBase.
      2. User takes incremental backups. The incremental backup will only track table1 and
      table2.
      3. User adds other tables (i.e. table3 and table4) in HBase, and an implicit full backup is
      executed during the add process
      4. User continues to take incremental backups. The incremental backup data would cover
      table1, table2, table3 and table4.
      5. User wants to restore table3 and table4 to a past PIT (point-in-time).
      6. Full backup in 3. is restored onto HBase cluster. Then the incremental backups after that
      full backup are applied on top of the full restore until the PIT.

      Use case example 2:

      1. User takes a full backup of a set of tables in HBase.
      2. User takes daily incremental backups.
      3. User merges the daily incremental backups into weekly incremental backups.
      4. User combines/rolls up the weekly incremental backup into monthly incremental
      backups.
      5. User wants to restore the tables to a past PIT.
      6. Full backup is restored onto HBase cluster.
      7. Monthly incremental backups before the desired PIT are applied.
      8. Closest daily backups up to the PIT are applied.

      To create full backup:

      HBASE_DIR/bin/hbase backup create full <backup_root_path> [tables]

      backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
      tables - list of tables, comma-separated. If no tables specified then all tables will be saved.

      To create full backup:

      HBASE_DIR/bin/hbase backup create incremental <backup_root_path> [tables]

      backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
      tables - list of tables, comma-separated. If no tables specified then all tables will be saved.

      To restore table(s):

      HBASE_DIR/bin/hbase backup restore <backup_root_path> <backup_id> [tables]

      backup_root_path - path to backup root directory (file://, hdfs:// or any other Hadoop-compatible path)
      backup_id - The id identifying the backup image.
      tables - list of tables, comma-separated.

      FOR EXPERIENCED USERS only:

      To get list of backup ids you will need to scan hbase:backup table using hbase shell or other means.



      Show
      This experimental feature allows to perform backup/restore operations, including incremental ones, on a set of HBase tables. Key features and Use Cases A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Therefore we want use full backup in combination with incremental backups for HBase as well. The following is a typical use case scenario for full and incremental backup: ● The user takes a full backup of a table or a set of tables in HBase. ● The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. ● The user needs to restore table data to a past point in time. ● The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. ● Backup to DFS FileSystem across clusters and possibly to other storage media or servers. ● Support single table or a set of tables backup and restore (full and incremental). ● Restore to different table names and to different clusters. ● Support adding and removing tables to and from backup set without interruption of incremental backup schedule. ● Support merge of incremental backups into longer period and bigger incremental backups for easy storage and restore. ● Support scheduled backups. ● Unified command line interface for all the above. To illustrate these key capabilities, the following are two more detailed use case examples. Use case example 1: 1. User takes a full backup of a set of tables (i.e. table1 and table2) in HBase. 2. User takes incremental backups. The incremental backup will only track table1 and table2. 3. User adds other tables (i.e. table3 and table4) in HBase, and an implicit full backup is executed during the add process 4. User continues to take incremental backups. The incremental backup data would cover table1, table2, table3 and table4. 5. User wants to restore table3 and table4 to a past PIT (point-in-time). 6. Full backup in 3. is restored onto HBase cluster. Then the incremental backups after that full backup are applied on top of the full restore until the PIT. Use case example 2: 1. User takes a full backup of a set of tables in HBase. 2. User takes daily incremental backups. 3. User merges the daily incremental backups into weekly incremental backups. 4. User combines/rolls up the weekly incremental backup into monthly incremental backups. 5. User wants to restore the tables to a past PIT. 6. Full backup is restored onto HBase cluster. 7. Monthly incremental backups before the desired PIT are applied. 8. Closest daily backups up to the PIT are applied. To create full backup: HBASE_DIR/bin/hbase backup create full <backup_root_path> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) tables - list of tables, comma-separated. If no tables specified then all tables will be saved. To create full backup: HBASE_DIR/bin/hbase backup create incremental <backup_root_path> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) tables - list of tables, comma-separated. If no tables specified then all tables will be saved. To restore table(s): HBASE_DIR/bin/hbase backup restore <backup_root_path> <backup_id> [tables] backup_root_path - path to backup root directory ( file://, hdfs:// or any other Hadoop-compatible path) backup_id - The id identifying the backup image. tables - list of tables, comma-separated. FOR EXPERIENCED USERS only: To get list of backup ids you will need to scan hbase:backup table using hbase shell or other means.

    Description

      This is the umbrella ticket for Backup/Restore Phase 1. See HBASE-7912 design doc for the phase description.

      Attachments

        1. HBASE-14030-v0.patch
          296 kB
          Vladimir Rodionov
        2. HBASE-14030-v1.patch
          313 kB
          Vladimir Rodionov
        3. HBASE-14030-v2.patch
          327 kB
          Vladimir Rodionov
        4. HBASE-14030-v3.patch
          336 kB
          Vladimir Rodionov
        5. HBASE-14030-v4.patch
          362 kB
          Vladimir Rodionov
        6. HBASE-14030-v5.patch
          361 kB
          Vladimir Rodionov
        7. HBASE-14030-v6.patch
          361 kB
          Vladimir Rodionov
        8. HBASE-14030-v7.patch
          363 kB
          Vladimir Rodionov
        9. HBASE-14030-v8.patch
          363 kB
          Vladimir Rodionov
        10. HBASE-14030-v10.patch
          356 kB
          Vladimir Rodionov
        11. HBASE-14030-v11.patch
          357 kB
          Vladimir Rodionov
        12. HBASE-14030-v12.patch
          359 kB
          Vladimir Rodionov
        13. HBASE-14030-v13.patch
          361 kB
          Vladimir Rodionov
        14. HBASE-14030-v14.patch
          361 kB
          Vladimir Rodionov
        15. HBASE-14030-v15.patch
          361 kB
          Vladimir Rodionov
        16. HBASE-14030-v17.patch
          363 kB
          Vladimir Rodionov
        17. HBASE-14030-v18.patch
          361 kB
          Vladimir Rodionov
        18. HBASE-14030-v20.patch
          404 kB
          Vladimir Rodionov
        19. HBASE-14030-v21.patch
          429 kB
          Vladimir Rodionov
        20. HBASE-14030-v22.patch
          418 kB
          Vladimir Rodionov
        21. HBASE-14030-v23.patch
          416 kB
          Vladimir Rodionov
        22. HBASE-14030-v24.patch
          416 kB
          Vladimir Rodionov
        23. HBASE-14030-v25.patch
          373 kB
          Vladimir Rodionov
        24. HBASE-14030-v26.patch
          373 kB
          Vladimir Rodionov
        25. HBASE-14030-v27.patch
          438 kB
          Vladimir Rodionov
        26. HBASE-14030-v28.patch
          438 kB
          Vladimir Rodionov
        27. HBASE-14030-v30.patch
          581 kB
          Vladimir Rodionov
        28. HBASE-14030-v35.patch
          826 kB
          Vladimir Rodionov
        29. hbase-14030_v36.patch
          712 kB
          Enis Soztutar
        30. HBASE-14030-v37.patch
          712 kB
          Vladimir Rodionov
        31. HBASE-14030.v38.patch
          712 kB
          Ted Yu
        32. HBASE-14030.v39.patch
          712 kB
          Ted Yu
        33. HBASE-14030.v40.patch
          715 kB
          Ted Yu
        34. HBASE-14030.v41.patch
          717 kB
          Ted Yu
        35. HBASE-14030.v42.patch
          716 kB
          Ted Yu
        36. HBASE-14030.v43.patch
          716 kB
          Ted Yu

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              vrodionov Vladimir Rodionov
              vrodionov Vladimir Rodionov
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: