Skip to content

Conversation

@ChenSammi
Copy link
Contributor

@ChenSammi ChenSammi commented Jul 4, 2025

What changes were proposed in this pull request?

After HDDS-8829, dt is signed and verified by secret key generated and managed by SCM.

Secret key has following configurations and default values,

hdds.secret.key.expiry.duration 7d
hdds.secret.key.rotate.duration 1d
hdds.secret.key.rotate.check.duration 10m

dt has following configurations and default values,

ozone.manager.delegation.token.max-lifetime 7d
ozone.manager.delegation.token.renew-interval 1d
ozone.manager.delegation.remover.scan.interval 1h

It's possible a dt is created near the secret key rotation duration (1 day), in which case the dt could stay valid much longer than the secret key.

Say a secret key is created at June 1st, 00:00am.
A dt is created using this secret key at June 1st 23:59pm.

The secret key would expire at June 8th, 00:00am and removed from SCM memory by June 8th, 00:10am.
The dt last renewed at June 7st 23:59pm, would expire at June 8th, 23:59pm and removed by June 9th 00:59am. So during June 8th, 00:01am to June 9th 00:59am, if OM restarts, the secret key is not available to calculate the password of this dt which is still valid.

So secret key cannot be expired and removed from SCM before all its possible signed dt are expired. To achieve this, dt key configurations must have dependency on secret key configurations, for example,

  1. SCM 1d(rotate), 7d(max), 10m(check) OM 1d(renew), 7d(max), 1h(check)
    secret key desired max lifetime = 7d(dt max) + 1d (sk rotate) + 1h (dt check) > 7d (sk max) -> will cause issue
  2. SCM 1h(rotate), 3d(max), 10m(check) OM 1d(renew), 14d(max), 10m(check)
    secret key desired max lifetime = 14d(dt max) + 1h(sk rotate) + 10m( dt check) > 3d (sk max) -> will cause issue
  3. SCM 1d(rotate), 7d(max), 10m(check) OM 1d(renew), 3d(max), 1h(check)
    secret key desired max lifetime = 3d(dt max) + 1d(sk rotate) + 1h(dt check) < 7d (sk max) -> will not cause issue

So overall, the formula to follow is
( dt max + sk renew + dt check ) < sk max

BTW, block token and container token by default doesn't have such problem, for their lifecycle is
"hdds.block.token.expiry.time 1d" which is far shorter than sk max 7d and these tokens doesn't persist and doesn't renew.

The reason that default value of hdds.secret.key.expiry.duration is changed from 7d to 9d, while ozone.manager.delegation.token.max-lifetime keeps using 7d default value, is to be compatible with HDFS delegation token default value, as hadoop uses are used to 7d delegation token lifetime, and secret key lifetime is internal to Ozone.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13343

How was this patch tested?

new unit test and existing unit tests.

@adoroszlai adoroszlai changed the title HDDS-13343. dt configuration dependency on secert key configuration HDDS-13343. Consider delegation token lifetime for secret key expiry Jul 5, 2025
@kerneltime
Copy link
Contributor

Can the delegation token renew or creation be set to expire when the underlying secret key is set to expire?

@ChenSammi
Copy link
Contributor Author

Can the delegation token renew or creation be set to expire when the underlying secret key is set to expire?

delegation token will not be created using expired secret key.
when delegation token is renewed, secret key is not expired. The problem happens after last renew of delegation token, before this delegation token expiry.

@kerneltime
Copy link
Contributor

What is the duration for the delegation token to expire? Is it something the server can choose? Let's say the secret key is set to expire in 10 hours. Then, can the delegation token being generated also be set to expire in 10 hours, even if requested for 24 hours?

@jojochuang
Copy link
Contributor

Ozone delegation token life time (ozone.manager.delegation.token.renew-interval): 1 day. Max life time (ozone.manager.delegation.token.max-lifetime): 7 days. These are determined by OM.

Secret key expires: hdds.secret.key.expiry.duration: 7 days.
hdds.secret.key.rotate.duration: 1 day.

So hdds.secret.key.expiry.duration should be at least ozone.manager.delegation.token.max-lifetime + hdds.secret.key.rotate.duration = 8 days.

@ChenSammi
Copy link
Contributor Author

What is the duration for the delegation token to expire? Is it something the server can choose? Let's say the secret key is set to expire in 10 hours. Then, can the delegation token being generated also be set to expire in 10 hours, even if requested for 24 hours?

Both secret key expiry and delegation token expire are configurable. This patch is to adjust the default secret key expiry to mitigate the problem that secret key expired before delegation token expire. If user choose to set secret key to expire 10 hours, then user should also choose to set the delegation token expire to a value less than 10 hours(you can refer to formula in the above MR description)

The reason that default value of hdds.secret.key.expiry.duration is changed from 7d to 9d, while ozone.manager.delegation.token.max-lifetime keeps using 7d default value, is to be compatible with HDFS delegation token default value, as hadoop uses are used to 7d delegation token lifetime, and secret key lifetime is internal to Ozone.

@ChenSammi
Copy link
Contributor Author

Ozone delegation token life time (ozone.manager.delegation.token.renew-interval): 1 day. Max life time (ozone.manager.delegation.token.max-lifetime): 7 days. These are determined by OM.

Secret key expires: hdds.secret.key.expiry.duration: 7 days. hdds.secret.key.rotate.duration: 1 day.

So hdds.secret.key.expiry.duration should be at least ozone.manager.delegation.token.max-lifetime + hdds.secret.key.rotate.duration = 8 days.

The exact hdds.secret.key.expiry.duration required is 8 day and 1 hour. Using 9d to make a concise default value.

This default value, in combination with hdds.secret.key.rotate.duration=1d, results in 7 secret keys (for the
last 7 days) are kept valid at any point of time.
This default value, in combination with hdds.secret.key.rotate.duration=1d, results in 9 secret keys (for the
last 9 days) are kept valid at any point of time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add that hdds.secret.key.expiry.duration must be at least hdds.secret.key.rotate.duration + ozone.manager.delegation.token.max-lifetime

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a hack more than proper solution. I don't have a good idea what a proper fix would be, but if we want to proceed with this approach, please make a note in the configuration property so any one wanting to tweak it don't accidentally break the system. At least, with HDDS-13234 it shouldn't crash OM.

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Jul 10, 2025

IMO this is a hack more than proper solution. I don't have a good idea what a proper fix would be, but if we want to proceed with this approach, please make a note in the configuration property so any one wanting to tweak it don't accidentally break the system. At least, with HDDS-13234 it shouldn't crash OM.

with HDDS-13234, OM will not crash during restart, but with the current default configuration value, the renewed and valid dt cannot be verified, which is a feature problem need be fixed.

@ChenSammi
Copy link
Contributor Author

@jojochuang , would you like to take another look?

@jojochuang jojochuang merged commit 8651aa4 into apache:master Jul 11, 2025
81 of 82 checks passed
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
… key expiry (apache#8742)

Change-Id: I94ca9b1fddd31ca751a816d8981878d09e77606c
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
… key expiry (apache#8742)

(cherry picked from commit 8651aa4)

 Conflicts:
	hadoop-ozone/dist/src/main/compose/common/security.conf
	hadoop-ozone/dist/src/main/compose/ozonesecure-ha/docker-config
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestBlockTokens.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestSecureOzoneCluster.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestSecretKeySnapshot.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java

Change-Id: I6b81de426c8b4c4626d8bbc51384f29b634af857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants