Skip to content

Conversation

@swamirishi
Copy link
Contributor

@swamirishi swamirishi commented Nov 17, 2025

What changes were proposed in this pull request?

Currently dbTxSequenceNumber is computed in the validateAndUpdateCache method which can fetch wrong result and this needs to be computed in double buffer flush phase instead. Moreover this info should be persisted in the localData yaml file since OM rocksdb content should not have any divergence.
This won't cause any issue in snapshot diff output but the only consequence would be that the snapshot diff output can potentially return more delta entries in case of a pruned DAG and the snapshot diff could end up falling back to full diff.

This is built on #9312

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13949

How was this patch tested?

Added unit test

Change-Id: Iba47aeb21663dfa407ab71339cef02c0d74b49f2
Change-Id: Ifd2feca1fddb144e4955db025f0b15a2ab1f3bfe
Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
…otLocalDataManager

Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
Change-Id: I32bcaf2a1fb290f1790c02872a0230cd65586636
Change-Id: I105a2e8178c0444d52de41b99801f4ceb6d57ffd

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/Checksum.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotLocalDataYaml.java
Change-Id: I985170e38fb8beeb784048e85a08a4c79e1aec97
Change-Id: I33e6e6e825bf23c323ad7ed593d800a11720fa4f
Change-Id: If30b2c766db82adde72145c8ecd3e590ef54cc2d
Change-Id: Id3f2c49050bc3476b9e0f5f51dacb6d9acc4c2f7
Change-Id: I432960725b4c6c55aa906b5780cc3027e41e10db
Change-Id: I3c5514e5bbd251a2b5297d8f074cfde5c71fa543
Change-Id: Ib5a9e6c91bdccba17820263c47eaf2c8400e930d
Change-Id: Ica36e0615c7bc6aa9b6a7f6fafafd0f830d4bafb
Change-Id: I26b66f266bb7677e4b1078f5fcd9f2ce3a651a70

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: I1d93dbc048a42cc55ff1f8ffa420e52f967527b8
Change-Id: I34202928a7a367dd0a1e57219317ff34de352b78
Change-Id: Iad6f26cb71ec921c51ee2d138745df1a2663533f
Change-Id: Ic5f7e249cfb9cb3973cbcd4abd36b22a6ff8f5aa
…calDataProvider

Change-Id: I3a004b4b435075a4348960aeed642e8da71e7e72
Change-Id: I06990bc9ab8fc7e1eb7bec255646a650bd8c35fe
Change-Id: I4c6c61c83aa9fadab8ecef854b99dcc0a89a2208
Change-Id: I0e476322372a302572f1fe79cbf2e874bfeac2ed
Change-Id: I31004e0c95dad64411c6fe848501a82f2f773cba
Change-Id: Id317c8b56e8b25c122b68eaf96599b9690d08f79
Change-Id: I3849387d064e093634e69cdaf870d27c1934cda5

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: Ie5e5f3dab4324103e8855dd15619d7755f0422e6
Change-Id: I55bd5c3ef7fc32910a9111328638de2edffcd541
Change-Id: I880997d3eebdf378f14c203c61c2d63b2d17552e
Change-Id: I13ba8e2fd012a3c964d657e83496c93a4f55a3be
Change-Id: I63db69c634d96de05aaad865b467f1eee671440c
Change-Id: Ib313080b2ffa2f6e7545ae817f5e456cabc5dba7
Change-Id: I41308fd260de2e7ecdd8e8318e2155096212b687
Change-Id: Ie607809e088aee642940f72fdd12a258a06cc96d

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
@swamirishi
Copy link
Contributor Author

swamirishi commented Nov 18, 2025

Having local YAML for every snapshot have below concern,

  1. for any update of yaml, need ensure, update done in new file, existing to be moved to backup and rename new one, delete backup.
  2. Any failure in above need a recovery during startup
  3. Reading 64k yaml file is slower for doing verification.
  4. A scheduler to verify the consistency of file or during startup

My suggestion,

  • Can have another common local-rocksdb or snapshot-db for this, which handle all above implicitly. So this do not need scan of each file and its transaction safe implicitly present.

related to redability,

  • value can be yaml content OR ozone ldb can support same.

I don't think we need to be worried about the scale aspect here. Reading one yaml file would be order of 100 microseconds, so even if we have to read 64000 snapshots we are talking about few seconds at best.
We can create a backburner jira for this and move this into a rocksdb later on. Once we are done with the defrag implementation.
Our rational behind having a yaml file was to be able to debug things easily when things go wrong since these files are human readable.

Change-Id: Ie464f0ed154e8911024dbe7802338e11a83f502a
Change-Id: I8abb7a9db2e595f0cf2a0d257cb38caaacec6e1f
Change-Id: I09301ff17cf3c8abab5785d25336c7633f8ac293
Change-Id: I3e885ef24c95280ea0acccaf44787bd295aedbe4

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/IOUtils.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/DeltaFileComputer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/FileLinkDeltaFileComputer.java
Change-Id: I6ae5c5897f9ca8580ba68b17ad77dffa35dfafb6
Change-Id: Ib6089c9c7d8c787edd75721234d94d1ea0eb88c9
Change-Id: Idbf1d64b72ff4a8c80b2da8515639e430cef895c
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straightforward. Please rebase after merging #9312

Change-Id: Ie968331ba23db3ae33ca421599a75ead35076d48

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/RDBDifferComputer.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/TestCompositeDeltaDiffComputer.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/TestRDBDifferComputer.java
Change-Id: I227cf7471e5235204926fb5f48c754516222fc30
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New client won't work properly with an old OM because dbTxSequenceNumber will not be set by client. Looks like this is only going to affect snapshot create request .

@smengcl
Copy link
Contributor

smengcl commented Nov 20, 2025

Should this be marked ready?

optional string snapshotPath = 10;
optional string checkpointDir = 11 [deprecated = true];
optional int64 dbTxSequenceNumber = 12;
optional int64 dbTxSequenceNumber = 12 [deprecated = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safe to just remove this field later, so that when SnapshotInfo is serialized again this field will be ignored while maintaining compatibility, as long as we don't use field number 12 ever again for this message.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the dbTxSequenceNumber field from SnapshotInfo to OmSnapshotLocalData (persisted in YAML files), addressing an architectural issue where this value was being computed incorrectly during the validateAndUpdateCache phase. The field now gets computed during the double buffer flush phase from SST file metadata, ensuring consistency and preventing potential issues with pruned DAGs causing incorrect snapshot diff outputs.

Key changes:

  • Removed dbTxSequenceNumber from SnapshotInfo class and deprecated it in the protobuf schema
  • Added dbTxSequenceNumber to OmSnapshotLocalData with computation from SST file largestSeqno values
  • Updated all usages to fetch dbTxSequenceNumber from local data instead of snapshot info

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/SnapshotInfo.java Removed dbTxSequenceNumber field, getter, setter, and builder methods from SnapshotInfo class
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto Marked dbTxSequenceNumber field as deprecated in SnapshotInfo protobuf message
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java Added dbTxSequenceNumber field with getter and copy constructor support
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java Added YAML serialization/deserialization support for dbTxSequenceNumber
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java Added OM_SLD_DB_TXN_SEQ_NUMBER constant for YAML field name
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java Implemented computation of dbTxSequenceNumber from SST file metadata during snapshot creation
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotCreateRequest.java Removed incorrect computation of dbTxSequenceNumber from validateAndUpdateCache
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/diff/delta/RDBDifferComputer.java Updated to fetch dbTxSequenceNumber from OmSnapshotLocalData instead of SnapshotInfo
hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Fixed documentation comment for return type
Test files Updated all tests to remove dbTxSequenceNumber from mock SnapshotInfo objects and added validation for the new field in OmSnapshotLocalData

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -363,7 +363,7 @@ public void testComputeDeltaFilesWithVersionMapping() throws IOException {
* Tests that getDSIFromSI throws exception when no versions found.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still references the old method name 'getDSIFromSI'. It should be updated to 'toDifferSnapshotInfo' to match the renamed method on line 366.

Suggested change
* Tests that getDSIFromSI throws exception when no versions found.
* Tests that toDifferSnapshotInfo throws exception when no versions found.

Copilot uses AI. Check for mistakes.
final String prevSnapIdStr = (String) nodes.get(OzoneConsts.OM_SLD_PREV_SNAP_ID);
UUID prevSnapId = prevSnapIdStr != null ? UUID.fromString(prevSnapIdStr) : null;
final String purgeTxInfoStr = (String) nodes.get(OzoneConsts.OM_SLD_TXN_INFO);
final long dbTxnSeqNumber = ((Number)nodes.get(OzoneConsts.OM_SLD_DB_TXN_SEQ_NUMBER)).longValue();
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when reading existing YAML files that don't have the 'dbTxSequenceNumber' field. The code assumes the field exists and directly calls .longValue() without null-checking. Consider using getOrDefault() with a default value of 0L, similar to how 'lastDefragTime' is handled on lines 188-195.

Suggested change
final long dbTxnSeqNumber = ((Number)nodes.get(OzoneConsts.OM_SLD_DB_TXN_SEQ_NUMBER)).longValue();
Object dbTxnSeqNumberObj = nodes.getOrDefault(OzoneConsts.OM_SLD_DB_TXN_SEQ_NUMBER, 0L);
long dbTxnSeqNumber;
if (dbTxnSeqNumberObj instanceof Number) {
dbTxnSeqNumber = ((Number) dbTxnSeqNumberObj).longValue();
} else {
throw new IllegalArgumentException("Invalid type for dbTxSequenceNumber: " +
dbTxnSeqNumberObj.getClass().getName() + ". Expected Number type.");
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never happen

// Stores the transactionInfo corresponding to OM when the snaphot is purged.
private TransactionInfo transactionInfo;

// Stores the rocksDB's transaction sequence number at the time of snapshot creation.'
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: trailing apostrophe should be removed. The period at the end is sufficient.

Suggested change
// Stores the rocksDB's transaction sequence number at the time of snapshot creation.'
// Stores the rocksDB's transaction sequence number at the time of snapshot creation.

Copilot uses AI. Check for mistakes.
() -> {
List<LiveFileMetaData> lfms = getLiveSSTFilesForCFs(snapshotStore.getDb().getManagedRocksDb(),
COLUMN_FAMILIES_TO_TRACK_IN_SNAPSHOT);
long dbTxnSeqNumber = lfms.stream().mapToLong(LiveFileMetaData::largestSeqno).max().orElse(0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to fall back to the dbTxSeqNumber stored in SnapshotInfo proto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this should be ok. We are basically talking about acquiring an empty rocksdb if there are no sst files for the tables.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Pls check Copilot's comment
  2. Pls double check for any upgrade concerns (e.g. the one noted by Wei-Chiu above)
  3. I have a few minor comments inline

Other than that, I am good with this. Thanks @swamirishi

@swamirishi
Copy link
Contributor Author

swamirishi commented Nov 20, 2025

New client won't work properly with an old OM because dbTxSequenceNumber will not be set by client. Looks like this is only going to affect snapshot create request .

I don't think we expose the dbTxn number. So this should be ok. The value captured earilier was also smaller than the snapshot's value so it should be ok if we start showing a 0. The worst that would happen is the DAG traversal will go all the way to the top. We should never expose this value to clients ever since this is specific to one OM and the value returned to clients could be different depending on the OM serviing the request.

Change-Id: If0400fa3ac799e8487888dd1149914fe093a5e6f
@swamirishi swamirishi marked this pull request as ready for review November 20, 2025 17:04
@swamirishi
Copy link
Contributor Author

Thank you @jojochuang and @smengcl for reviewing the patch

@swamirishi swamirishi merged commit 0381646 into apache:master Nov 20, 2025
129 of 137 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants