Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
848d52b
HDDS-12607. Parallelize recon tasks to speed up OM rocksdb reading ta…
ArafatKhan2198 Nov 4, 2025
b9100a6
Reverted the changes for NSSummaryTask
ArafatKhan2198 Nov 5, 2025
1b39a9a
Removed unnecessary changes
ArafatKhan2198 Nov 5, 2025
266138a
Refactored and changed the code a bit
ArafatKhan2198 Nov 7, 2025
7cfc000
Removed the commented code
ArafatKhan2198 Nov 10, 2025
e67b310
Made code review changes- Added RejectionPolicy, Re-added flush logic…
ArafatKhan2198 Nov 10, 2025
a9855b0
Added a flag isFlushingInProgress to avoid race condition while flushing
ArafatKhan2198 Nov 10, 2025
56a2e40
Made final code changes for testing
ArafatKhan2198 Nov 10, 2025
8424170
Prevent FSO/OBS container count overwrite by using a shared Concurren…
ArafatKhan2198 Nov 12, 2025
b93511e
Added write lock in the writeCountsToDB method
ArafatKhan2198 Nov 13, 2025
e37bb11
Removed extra locking and improved exception log messages
ArafatKhan2198 Nov 14, 2025
c02718e
Defined proper caller based locking for writeConuntsToDB
ArafatKhan2198 Nov 14, 2025
648e94c
Some more improvements
ArafatKhan2198 Nov 17, 2025
0c0c09a
Added changes to the parallel Iterator class
ArafatKhan2198 Nov 24, 2025
f7e0cf3
Consolidated all the locks in a single lock for simplicity
ArafatKhan2198 Nov 24, 2025
f4ddf9e
Fixed memory leak by clearing the shared map
ArafatKhan2198 Nov 24, 2025
28719bf
Refactored the code
ArafatKhan2198 Nov 24, 2025
d3361f6
Made the lock for the file size count task local rather than global
ArafatKhan2198 Nov 24, 2025
ff3c6e9
Remove un-necessary code from OzoneManagerServiceProviderImpl
ArafatKhan2198 Nov 24, 2025
ce8a0bd
Added Parallelization for the OmtableInsightTask
ArafatKhan2198 Nov 25, 2025
8924135
Small Refactors
ArafatKhan2198 Nov 26, 2025
1787b65
Remove the flag isFlushingInProgress
ArafatKhan2198 Nov 26, 2025
37390f7
Refactored changes 1
ArafatKhan2198 Dec 1, 2025
ce0d63b
Removed the complex locking in ContainerKeyMapper task
ArafatKhan2198 Dec 1, 2025
03b2dc2
Added a Configurable flush threshold implemented
ArafatKhan2198 Dec 1, 2025
99098d9
Removed unnecessary code from service provider impl
ArafatKhan2198 Dec 1, 2025
e6a4552
Removed unnecessary commits
ArafatKhan2198 Dec 1, 2025
c47a4ba
Removed unnecessary code
ArafatKhan2198 Dec 1, 2025
e795738
Removed the unwanted change
ArafatKhan2198 Dec 1, 2025
34a8abc
Refactored the code and reduced the number of log messages
ArafatKhan2198 Dec 2, 2025
eccd370
Fixed final review comments
ArafatKhan2198 Dec 4, 2025
940b271
Removed unnecessary changes
ArafatKhan2198 Dec 4, 2025
702e92b
Removed some extra changes
ArafatKhan2198 Dec 4, 2025
58bcccd
Reverted back to the old change
ArafatKhan2198 Dec 4, 2025
f55fe25
Fixed some more code
ArafatKhan2198 Dec 4, 2025
47eaf82
Removed testing code
ArafatKhan2198 Dec 4, 2025
980135d
Recon: Improve type safety in OmTableHandlers and fix ACTIVE_TASK_COU…
ArafatKhan2198 Dec 4, 2025
cff7278
Removed the supress warning
ArafatKhan2198 Dec 8, 2025
04df207
Fixed checkstyle
ArafatKhan2198 Dec 9, 2025
bed2151
Fixed findbugs
ArafatKhan2198 Dec 9, 2025
1a26915
Removed testing code
ArafatKhan2198 Dec 9, 2025
73434a1
Merge branch 'master' into HDDS-12607-updated
ArafatKhan2198 Dec 9, 2025
0c9cd92
Fixed failing tests
ArafatKhan2198 Dec 9, 2025
60a197a
Fixed TestConfigurationFieldsBase
ArafatKhan2198 Dec 9, 2025
e670c93
Fixed container endpoint
ArafatKhan2198 Dec 9, 2025
a4d4f35
Fixed checkstyle issues
ArafatKhan2198 Dec 9, 2025
d60d4fe
Fixed failing tests
ArafatKhan2198 Dec 9, 2025
6addd6d
Fixed TestContainerEndpoint
ArafatKhan2198 Dec 10, 2025
1a1e838
Fixed checkstyle
ArafatKhan2198 Dec 10, 2025
ca379ff
Fixed TestReconAndAdminContainerCLI
ArafatKhan2198 Dec 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions hadoop-hdds/common/src/main/resources/ozone-default.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4318,6 +4318,43 @@
recon rocks DB containerKeyTable
</description>
</property>

<property>
<name>ozone.recon.filesizecount.flush.db.max.threshold</name>
<value>200000</value>
<tag>OZONE, RECON, PERFORMANCE</tag>
<description>
Maximum threshold number of entries to hold in memory for File Size Count task in hashmap before flushing to
recon derby DB
</description>
</property>

<property>
<name>ozone.recon.task.reprocess.max.iterators</name>
<value>5</value>
<tag>OZONE, RECON, PERFORMANCE</tag>
<description>
Maximum number of iterator threads to use for parallel table iteration during reprocess
</description>
</property>

<property>
<name>ozone.recon.task.reprocess.max.workers</name>
<value>20</value>
<tag>OZONE, RECON, PERFORMANCE</tag>
<description>
Maximum number of worker threads to use for parallel table processing during reprocess
</description>
</property>

<property>
<name>ozone.recon.task.reprocess.max.keys.in.memory</name>
<value>2000</value>
<tag>OZONE, RECON, PERFORMANCE</tag>
<description>
Maximum number of keys to batch in memory before handing to worker threads during parallel reprocess
</description>
</property>

<property>
<name>ozone.recon.heatmap.provider</name>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,8 @@ private static void setupConfigKeys() {
1, SECONDS);
CONF.setTimeDuration(HddsConfigKeys.HDDS_SCM_WAIT_TIME_AFTER_SAFE_MODE_EXIT,
0, SECONDS);
// Configure multiple task threads for concurrent task execution
CONF.setInt("ozone.recon.task.thread.count", 6);
CONF.set(OzoneConfigKeys.OZONE_SCM_CLOSE_CONTAINER_WAIT_DURATION, "2s");
CONF.set(ScmConfigKeys.OZONE_SCM_PIPELINE_SCRUB_INTERVAL, "2s");
CONF.set(ScmConfigKeys.OZONE_SCM_PIPELINE_DESTROY_TIMEOUT, "5s");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ public void init() throws Exception {
OzoneConfiguration conf = new OzoneConfiguration();
conf.set(OMConfigKeys.OZONE_DEFAULT_BUCKET_LAYOUT,
OMConfigKeys.OZONE_BUCKET_LAYOUT_FILE_SYSTEM_OPTIMIZED);
// Configure multiple task threads for concurrent task execution
conf.setInt("ozone.recon.task.thread.count", 6);
recon = new ReconService(conf);
cluster = MiniOzoneCluster.newBuilder(conf)
.setNumDatanodes(3)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ public final class ReconConstants {
// For file-size count reprocessing: ensure only one task truncates the table.
public static final AtomicBoolean FILE_SIZE_COUNT_TABLE_TRUNCATED = new AtomicBoolean(false);

public static final AtomicBoolean CONTAINER_KEY_TABLES_TRUNCATED = new AtomicBoolean(false);
// For container key mapper reprocessing: ensure only one task performs initialization
// (truncates tables + clears shared map)
public static final AtomicBoolean CONTAINER_KEY_MAPPER_INITIALIZED = new AtomicBoolean(false);

private ReconConstants() {
// Never Constructed
Expand All @@ -105,6 +107,6 @@ private ReconConstants() {
*/
public static void resetTableTruncatedFlags() {
FILE_SIZE_COUNT_TABLE_TRUNCATED.set(false);
CONTAINER_KEY_TABLES_TRUNCATED.set(false);
CONTAINER_KEY_MAPPER_INITIALIZED.set(false);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,28 @@ public final class ReconServerConfigKeys {
public static final long
OZONE_RECON_CONTAINER_KEY_FLUSH_TO_DB_MAX_THRESHOLD_DEFAULT = 150 * 1000L;

public static final String
OZONE_RECON_FILESIZECOUNT_FLUSH_TO_DB_MAX_THRESHOLD =
"ozone.recon.filesizecount.flush.db.max.threshold";

public static final long
OZONE_RECON_FILESIZECOUNT_FLUSH_TO_DB_MAX_THRESHOLD_DEFAULT = 200 * 1000L;

public static final String
OZONE_RECON_TASK_REPROCESS_MAX_ITERATORS = "ozone.recon.task.reprocess.max.iterators";

public static final int OZONE_RECON_TASK_REPROCESS_MAX_ITERATORS_DEFAULT = 5;

public static final String
OZONE_RECON_TASK_REPROCESS_MAX_WORKERS = "ozone.recon.task.reprocess.max.workers";

public static final int OZONE_RECON_TASK_REPROCESS_MAX_WORKERS_DEFAULT = 20;

public static final String
OZONE_RECON_TASK_REPROCESS_MAX_KEYS_IN_MEMORY = "ozone.recon.task.reprocess.max.keys.in.memory";

public static final int OZONE_RECON_TASK_REPROCESS_MAX_KEYS_IN_MEMORY_DEFAULT = 2000;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments and explain : when/how to tune these ?

Also please provide some guidance in javadoc here for memory consumption with an example something like below:

Resource calculation:

  • FSO + OBS tasks running concurrently = 10 iterators + 40 workers = 50 threads
  • Plus caller thread pools = potentially 100+ threads during reprocess
  • Memory: 2000 keys × ~500 bytes/key × 2 tasks = ~2MB (reasonable)

Can also be added in ozone-default.xml documentation with tuning guidelines.

public static final String OZONE_RECON_SCM_SNAPSHOT_TASK_INTERVAL_DELAY =
"ozone.recon.scm.snapshot.task.interval.delay";

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -753,11 +753,11 @@ public boolean syncDataFromOM() {
fullSnapshotReconTaskUpdater.updateDetails();
// Update the current OM metadata manager in task controller
reconTaskController.updateOMMetadataManager(omMetadataManager);

// Pass on DB update events to tasks that are listening.
reconTaskController.consumeOMEvents(new OMUpdateEventBatch(
omdbUpdatesHandler.getEvents(), omdbUpdatesHandler.getLatestSequenceNumber()), omMetadataManager);

// Check if task reinitialization is needed due to buffer overflow or task failures
boolean bufferOverflowed = reconTaskController.hasEventBufferOverflowed();
boolean tasksFailed = reconTaskController.hasTasksFailed();
Expand Down
Loading