-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[HBASE-29520] Utilize Backed-up Bulkloaded Files in Incremental Backup #7246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0802f3a
eb775fa
f7e54d6
97b2464
aa9d04c
9d52dc6
4d89d38
d489d23
d9f3c95
12940d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,9 +19,10 @@ | |
|
|
||
| import static org.apache.hadoop.hbase.backup.BackupRestoreConstants.CONF_CONTINUOUS_BACKUP_WAL_DIR; | ||
| import static org.apache.hadoop.hbase.backup.BackupRestoreConstants.JOB_NAME_CONF_KEY; | ||
| import static org.apache.hadoop.hbase.backup.replication.BackupFileSystemManager.BULKLOAD_FILES_DIR; | ||
| import static org.apache.hadoop.hbase.backup.replication.BackupFileSystemManager.WALS_DIR; | ||
| import static org.apache.hadoop.hbase.backup.replication.ContinuousBackupReplicationEndpoint.DATE_FORMAT; | ||
| import static org.apache.hadoop.hbase.backup.replication.ContinuousBackupReplicationEndpoint.ONE_DAY_IN_MILLISECONDS; | ||
| import static org.apache.hadoop.hbase.backup.util.BackupUtils.DATE_FORMAT; | ||
|
|
||
| import java.io.IOException; | ||
| import java.net.URI; | ||
|
|
@@ -166,6 +167,26 @@ protected List<BulkLoad> handleBulkLoad(List<TableName> tablesToBackup) throws I | |
| Path tblDir = CommonFSUtils.getTableDir(rootdir, srcTable); | ||
| Path p = new Path(tblDir, regionName + Path.SEPARATOR + fam + Path.SEPARATOR + filename); | ||
|
|
||
| // For continuous backup: bulkload files are copied from backup directory defined by | ||
| // CONF_CONTINUOUS_BACKUP_WAL_DIR instead of source cluster. | ||
| String backupRootDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR); | ||
| if (backupInfo.isContinuousBackupEnabled() && !Strings.isNullOrEmpty(backupRootDir)) { | ||
| String dayDirectoryName = BackupUtils.formatToDateString(bulkLoad.getTimestamp()); | ||
| Path bulkLoadBackupPath = | ||
| new Path(backupRootDir, BULKLOAD_FILES_DIR + Path.SEPARATOR + dayDirectoryName); | ||
| Path bulkLoadDir = new Path(bulkLoadBackupPath, | ||
| srcTable.getNamespaceAsString() + Path.SEPARATOR + srcTable.getNameAsString()); | ||
| FileSystem backupFs = FileSystem.get(bulkLoadDir.toUri(), conf); | ||
| Path fullBulkLoadBackupPath = | ||
| new Path(bulkLoadDir, regionName + Path.SEPARATOR + fam + Path.SEPARATOR + filename); | ||
| if (backupFs.exists(fullBulkLoadBackupPath)) { | ||
| LOG.debug("Backup bulkload file found {}", fullBulkLoadBackupPath); | ||
| p = fullBulkLoadBackupPath; | ||
| } else { | ||
| LOG.warn("Backup bulkload file not found {}", fullBulkLoadBackupPath); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assumed we're hitting the following so, should we ask user to check if this is expected in the warning message?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In non-continuous incremental backup approach, bulkload files are copied directly from the source cluster to the backup location. In continuous backup approach, these files are instead copied from the bulkload backup location. I’ve added a warning here because if the required bulkload backup files are missing, they would be copied from the source cluster (as a fallback mechanism). Source cluster files are only deleted after a successful full or incremental backup (in both non-continuous and continuous) Ideally, with continuous backups, once a bulkload file has been backed up by the replication endpoint, it could be safely deleted from the source cluster. However, doing so would significantly complicate the checkpoint logic, since it would then depend on both WAL flushes and bulkload backups.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
[nit] right, my concerns are the action after the WARN message, and current message does not clearly tell what the user should do when he see this, do you think we should add some suggestion in this message ? we can improve it as a follow up JIRA or in the documentation, if user see this message , should they care about it? if so, what manual operation should they perform to get rid of it ? |
||
| } | ||
| } | ||
|
|
||
| String srcTableQualifier = srcTable.getQualifierAsString(); | ||
| String srcTableNs = srcTable.getNamespaceAsString(); | ||
| Path tgtFam = new Path(tgtRoot, srcTableNs + Path.SEPARATOR + srcTableQualifier | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] somehow,
hbase.backup.continuous.wal.diris become the root now for both WALs and Bulkloads, shouldn't we call it ashbase.backup.continuous.dir/hbase.backup.continuous.root.dirThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. I will create a new JIRA to update this config from everywhere it is referenced