Skip to content

Conversation

@prateekm
Copy link
Contributor

@prateekm prateekm commented Mar 6, 2023

Part 1 of 2. Follow up PR to restore side input stores using Blob Store backups coming soon.

Symptoms:

  1. Side input stores are uploaded twice when using Blob Store State Backend.
  2. Store-level Gauges (but not Timers) in BlobStoreBackupManagerMetrics are broken for side input stores.
  3. Task level Gauges in BlobStoreBackupManagerMetrics have incorrect value (count twice for side input stores).

Cause:

  1. StorageConfig#getStoreNames() returns side input stores twice in the list.
  2. BlobStoreBackupManager does not dedup storesToBackup list.
  3. PR SAMZA-2397: Updating gauge-val function on newGauge on same metric name #1223 makes the duplicate-registration behavior between Gauges and Timers inconsistent.

Changes:

  1. Fixed StorageConfig#getStoreNames() to dedup store names.
  2. Added defensive dedup in BlobStoreBackupManager.
  3. Changed store level metrics initialization in BlobStoreBackupManagerMetrics to computeIfAbsent instead of putIfAbsent to avoid overwriting-yet-returning-old-Gauges in case of duplicate store names.

Tests:
Added unit tests for StorageConfig to verify deduping.

@prateekm
Copy link
Contributor Author

prateekm commented Mar 6, 2023

cc @shekhars-li for review.

Copy link
Contributor

@shekhars-li shekhars-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants