-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-13929. Modularise Snapshot Delta file computer (Efficient Diff) #9312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: Iba47aeb21663dfa407ab71339cef02c0d74b49f2
Change-Id: Ifd2feca1fddb144e4955db025f0b15a2ab1f3bfe
Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
…otLocalDataManager Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
Change-Id: I32bcaf2a1fb290f1790c02872a0230cd65586636
Change-Id: I105a2e8178c0444d52de41b99801f4ceb6d57ffd # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/Checksum.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotLocalDataYaml.java
Change-Id: I985170e38fb8beeb784048e85a08a4c79e1aec97
Change-Id: I33e6e6e825bf23c323ad7ed593d800a11720fa4f
Change-Id: If30b2c766db82adde72145c8ecd3e590ef54cc2d
Change-Id: Id3f2c49050bc3476b9e0f5f51dacb6d9acc4c2f7
Change-Id: I432960725b4c6c55aa906b5780cc3027e41e10db
Change-Id: I3c5514e5bbd251a2b5297d8f074cfde5c71fa543
Change-Id: Ib5a9e6c91bdccba17820263c47eaf2c8400e930d
Change-Id: Ica36e0615c7bc6aa9b6a7f6fafafd0f830d4bafb
Change-Id: I26b66f266bb7677e4b1078f5fcd9f2ce3a651a70 # Conflicts: # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: I1d93dbc048a42cc55ff1f8ffa420e52f967527b8
Change-Id: I34202928a7a367dd0a1e57219317ff34de352b78
Change-Id: Iad6f26cb71ec921c51ee2d138745df1a2663533f
Change-Id: Ic5f7e249cfb9cb3973cbcd4abd36b22a6ff8f5aa
…calDataProvider Change-Id: I3a004b4b435075a4348960aeed642e8da71e7e72
Change-Id: I06990bc9ab8fc7e1eb7bec255646a650bd8c35fe
Change-Id: I4c6c61c83aa9fadab8ecef854b99dcc0a89a2208
Change-Id: I0e476322372a302572f1fe79cbf2e874bfeac2ed
Change-Id: I31004e0c95dad64411c6fe848501a82f2f773cba
Change-Id: Id317c8b56e8b25c122b68eaf96599b9690d08f79
Change-Id: I3849387d064e093634e69cdaf870d27c1934cda5 # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: Ie5e5f3dab4324103e8855dd15619d7755f0422e6
Change-Id: I55bd5c3ef7fc32910a9111328638de2edffcd541
Change-Id: I880997d3eebdf378f14c203c61c2d63b2d17552e
Change-Id: I13ba8e2fd012a3c964d657e83496c93a4f55a3be
Change-Id: I47fca4c2ca1738b533992851fa54291cb109cc39
Change-Id: Id07b78b650ce58fcc71bbac478d085b139123b02
Change-Id: Iacf5dd1fd2013d6203ea7003645cb79433186519
Change-Id: I22a212c84a3dbcd66a8e61a940d4df239b73133f
Change-Id: I4ab5c3d469b3729271ccccdf926699ec94b96e21
Change-Id: I857831bebe3a733d28998606cbf3f1b0676b2ba5
Change-Id: Idc5174ef66392fe16693aa8799aa51db7277a0e0 # Conflicts: # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdb/util/RdbUtil.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdb/util/SstFileInfo.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDiffUtils.java
Change-Id: I4c023aeca7771fe6f65cb09f62a6253583c0d278 # Conflicts: # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDiffUtils.java
Change-Id: I6bedaf83ebfc65f46a7c3fd298fa7dc5f26d7fd1
Change-Id: Ib313080b2ffa2f6e7545ae817f5e456cabc5dba7
Change-Id: I41308fd260de2e7ecdd8e8318e2155096212b687
Change-Id: Ie464f0ed154e8911024dbe7802338e11a83f502a
| snapProvider.getSnapshotLocalData()); | ||
| final Map<Integer, Integer> versionMap = snapProvider.getSnapshotLocalData().getVersionSstFileInfos().entrySet() | ||
| .stream().collect(toMap(Map.Entry::getKey, entry -> entry.getValue().getPreviousSnapshotVersion())); | ||
| synchronized (differ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? getSSTDiffListWithFullPath() is a synchronized method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is required since we are creating the hardlink in this method and not inside rocksdb checkpoint differ. The files could get deleted by then from the dag
| tablePrefixInfo).orElse(null); | ||
| } | ||
| } catch (Exception e) { | ||
| LOG.error("Falling back to full diff.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be a WARN if this operation can continue with the fallback approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Also please rebase the PR |
Change-Id: I3e885ef24c95280ea0acccaf44787bd295aedbe4 # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/IOUtils.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/DeltaFileComputer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/FileLinkDeltaFileComputer.java
Change-Id: I6ae5c5897f9ca8580ba68b17ad77dffa35dfafb6
Change-Id: Ib6089c9c7d8c787edd75721234d94d1ea0eb88c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request modularizes snapshot delta file computation by introducing a compositional architecture for computing SST file differences between two Ozone snapshots. The refactoring extracts the monolithic diff computation logic from SnapshotDiffManager into dedicated, reusable components.
- Creates
CompositeDeltaDiffComputerthat orchestrates between efficient RocksDB DAG-walk differ and fallback full-diff strategies - Introduces
RDBDifferComputerfor efficient SST file delta computation using RocksDB's checkpoint differ - Extracts
FullDiffComputeras a fallback mechanism for exhaustive file-by-file comparison - Updates method signatures to use
Pathinstead ofStringfor file references, improving type safety
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
CompositeDeltaDiffComputer.java |
New orchestrator class implementing composite pattern for delta file computation with RDBDiffer and FullDiff strategies |
RDBDifferComputer.java |
New implementation for efficient SST file delta computation using RocksDB checkpoint differ |
FileLinkDeltaFileComputer.java |
Updated abstract base class with renamed field tmpDeltaFileLinkDir and improved documentation |
FullDiffComputer.java |
Updated diff computer with enhanced documentation explaining delta file computation semantics |
TestRDBDifferComputer.java |
Comprehensive test suite for RDBDifferComputer covering various scenarios including version mapping and error handling |
TestCompositeDeltaDiffComputer.java |
Test suite for composite pattern with extensive mockConstruction usage to test fallback scenarios |
SnapshotDiffManager.java |
Refactored to delegate delta file computation to CompositeDeltaDiffComputer, removing 200+ lines of inline logic |
OmSnapshotManager.java |
Simplified constructor by removing direct differ dependency |
RocksDBCheckpointDiffer.java |
Modified getSSTDiffListWithFullPath to return Map<Path, SstFileInfo> instead of List<String>, removing link creation responsibility |
TestSstFileSetReader.java |
Updated to use Path instead of String for SST file references |
SstFileSetReader.java |
Updated constructor and internal methods to use Path instead of String |
RocksDiffUtils.java |
Removed unnecessary generic type parameter from filterRelevantSstFiles method signature |
TestSnapshotDiffManager.java |
Removed obsolete test methods for getDeltaFiles that are now handled by new computer classes |
findbugsExcludeFile.xml |
Added exclusion for test class to suppress false positive warnings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * differing SST files as hard links in the configured delta directory. | ||
| * | ||
| * <p>This class uses {@link RocksDBCheckpointDiffer} to obtain the list of SST | ||
| * files that differ between a \"from\" and a \"to\" snapshot. It opens local |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escaped quote in the javadoc comment should be a regular quote. Change \"from\" and \"to\" to just "from" and "to" as javadoc doesn't require escaping quotes.
| * files that differ between a \"from\" and a \"to\" snapshot. It opens local | |
| * files that differ between a "from" and a "to" snapshot. It opens local |
|
|
||
|
|
||
|
|
||
|
|
||
|
|
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Extra blank lines at the end of the file. Remove lines 532-535 to leave only a single blank line at the end of the file, following common Java style conventions.
| * responsible for computing delta files (a list of files if read completely would be able to completely | ||
| * compute all the key changes between two snapshots). Hard links to the | ||
| * relevant source files in a specified delta directory, enabling a compact | ||
| * representation of changes between snapshots. |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is grammatically incomplete. Lines 52-55 form a sentence fragment. Consider rewording to: "responsible for computing delta files (a list of files that, if read completely, would enable computation of all key changes between two snapshots) and creating hard links to the relevant source files in a specified delta directory, enabling a compact representation of changes between snapshots."
| * responsible for computing delta files (a list of files if read completely would be able to completely | |
| * compute all the key changes between two snapshots). Hard links to the | |
| * relevant source files in a specified delta directory, enabling a compact | |
| * representation of changes between snapshots. | |
| * responsible for computing delta files (a list of files that, if read completely, would enable computation of all key changes between two snapshots) and creating hard links to the relevant source files in a specified delta directory, enabling a compact representation of changes between snapshots. |
| * @return A list of SST files without extension. | ||
| * e.g. ["/path/to/sstBackupDir/000050.sst", | ||
| * "/path/to/sstBackupDir/000060.sst"] |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return description in the javadoc is outdated. The method now returns Optional<Map<Path, SstFileInfo>> but the documentation still says "A list of SST files without extension." Update this to accurately describe the return type as a Map of SST file paths to their corresponding SstFileInfo metadata.
| } | ||
| // Workaround to handle deletes if native rocksDb tool for reading | ||
| // tombstone is not loaded. | ||
| // When performing non native diff, input files of from snapshot needs to be added. |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error in comment. Change "input files of from snapshot needs to be added" to "input files from the from snapshot need to be added" (or "from the source snapshot need to be added" for clarity).
| // When performing non native diff, input files of from snapshot needs to be added. | |
| // When performing non native diff, input files from the 'from' snapshot need to be added. |
| OmSnapshot fromSnap = org.mockito.Mockito.mock(OmSnapshot.class); | ||
| OMMetadataManager fromMetaMgr = org.mockito.Mockito.mock(OMMetadataManager.class); | ||
| RDBStore fromRdbStore = org.mockito.Mockito.mock(RDBStore.class); | ||
| when(fromSnap.getMetadataManager()).thenReturn(fromMetaMgr); | ||
| when(fromMetaMgr.getStore()).thenReturn(fromRdbStore); | ||
| when(fromRdbStore.getDbLocation()).thenReturn(fromDbPath.toFile()); | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| UncheckedAutoCloseableSupplier<OmSnapshot> fromSnapSupplier = | ||
| (UncheckedAutoCloseableSupplier<OmSnapshot>) org.mockito.Mockito.mock(UncheckedAutoCloseableSupplier.class); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of mock() method. Lines 605-607 and 614 use the fully qualified org.mockito.Mockito.mock() instead of the already imported static method mock() (visible from the static imports at the top). Use mock(OmSnapshot.class) instead of org.mockito.Mockito.mock(OmSnapshot.class) for consistency with the rest of the test file.
| OmSnapshot fromSnap = org.mockito.Mockito.mock(OmSnapshot.class); | |
| OMMetadataManager fromMetaMgr = org.mockito.Mockito.mock(OMMetadataManager.class); | |
| RDBStore fromRdbStore = org.mockito.Mockito.mock(RDBStore.class); | |
| when(fromSnap.getMetadataManager()).thenReturn(fromMetaMgr); | |
| when(fromMetaMgr.getStore()).thenReturn(fromRdbStore); | |
| when(fromRdbStore.getDbLocation()).thenReturn(fromDbPath.toFile()); | |
| @SuppressWarnings("unchecked") | |
| UncheckedAutoCloseableSupplier<OmSnapshot> fromSnapSupplier = | |
| (UncheckedAutoCloseableSupplier<OmSnapshot>) org.mockito.Mockito.mock(UncheckedAutoCloseableSupplier.class); | |
| OmSnapshot fromSnap = mock(OmSnapshot.class); | |
| OMMetadataManager fromMetaMgr = mock(OMMetadataManager.class); | |
| RDBStore fromRdbStore = mock(RDBStore.class); | |
| when(fromSnap.getMetadataManager()).thenReturn(fromMetaMgr); | |
| when(fromMetaMgr.getStore()).thenReturn(fromRdbStore); | |
| when(fromRdbStore.getDbLocation()).thenReturn(fromDbPath.toFile()); | |
| @SuppressWarnings("unchecked") | |
| UncheckedAutoCloseableSupplier<OmSnapshot> fromSnapSupplier = | |
| (UncheckedAutoCloseableSupplier<OmSnapshot>) mock(UncheckedAutoCloseableSupplier.class); |
| Path fromSnapshotPath = fromSnapshot.get().getMetadataManager().getStore().getDbLocation() | ||
| .getAbsoluteFile().toPath(); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.get().getMetadataManager().getStore().getDbLocation() | |
| .getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = fromSnapshot.get().getSnapshotPath(); |
| Path fromSnapshotPath = fromSnapshot.get().getMetadataManager().getStore().getDbLocation() | ||
| .getAbsoluteFile().toPath(); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.get().getMetadataManager().getStore().getDbLocation() | |
| .getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = fromSnapshot.get().getDbLocation().getAbsoluteFile().toPath(); |
| Path deltaDirPath, Consumer<SubStatus> activityReporter) throws IOException { | ||
| super(omSnapshotManager, activeMetadataManager, deltaDirPath, activityReporter); | ||
| this.differ = activeMetadataManager.getStore().getRocksDBCheckpointDiffer(); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access of element annotated with VisibleForTesting found in production code.
| Path deltaDirPath, Consumer<SubStatus> activityReporter) throws IOException { | |
| super(omSnapshotManager, activeMetadataManager, deltaDirPath, activityReporter); | |
| this.differ = activeMetadataManager.getStore().getRocksDBCheckpointDiffer(); | |
| Path deltaDirPath, Consumer<SubStatus> activityReporter, RocksDBCheckpointDiffer differ) throws IOException { | |
| super(omSnapshotManager, activeMetadataManager, deltaDirPath, activityReporter); | |
| this.differ = differ; |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just code style suggestions. Once the CI comes back green feel free to merge and let's move on to the next one.
|
|
||
| @VisibleForTesting | ||
| @SuppressWarnings("checkstyle:ParameterNumber") | ||
| Set<String> getDeltaFiles(OmSnapshot fromSnapshot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed because not used.
| /** | ||
| * Convert from SnapshotInfo to DifferSnapshotInfo. | ||
| */ | ||
| private static DifferSnapshotInfo getDSIFromSI(OMMetadataManager activeOmMetadataManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine but semantically speaking I'd suggest to make it toDifferSnapshotInfo() in SnapshotInfo.
|
Will address the review comments in the next patch |
What changes were proposed in this pull request?
Modularise SnapshotDiff computation for figuring out delta files between 2 snapshots for efficient diff. Create a CompositeDiffComputer which would be responsible for computing delta files between 2 snapshots.
This is built on top of #9283
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13929
How was this patch tested?
Added unit test