Skip to content

Conversation

@Jennifer88huang-zz
Copy link
Contributor

Fixes #7372

Motivation

The retention policy is not clearly documented.

Modifications

Add detailed info on retention policy.
Add default value for "defaultRetentionTimeInMinutes"

Copy link
Contributor

@Huanli-Meng Huanli-Meng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave my comments. PTAL.
BTW, the original issue seems to occur in Pulsar 2.6.0, could you please help confirm whether docs in previous releases should be updated two? Thanks.

- If you want to disable retention policy, set the values of time limit and size limit to `0`. Retention policy is disabled by default.

The retention settings apply to all messages on topics that do not have any subscriptions, or if there are subscriptions, to messages that have been acked by all subscriptions. The retention policy settings do not affect unacknowledged messages on topics with subscriptions -- these are instead controlled by the backlog quota (see below).
When you set a size limit of, for example, 10 gigabytes, and set the time limit to `-1`, then the acknowledged messages in all topics in the namespace are retained until the topic size reaches the size limit(10 gigabytes). If you set a time limit of, for example, 1 day, and set the size limit to `-1`, then the acknowledged messages for all topics in the namespace are retained for 1 day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you set a size limit of, for example, 10 gigabytes, and set the time limit to `-1`, then the acknowledged messages in all topics in the namespace are retained until the topic size reaches the size limit(10 gigabytes). If you set a time limit of, for example, 1 day, and set the size limit to `-1`, then the acknowledged messages for all topics in the namespace are retained for 1 day.
When you set a size limit of, for example, 10 GB, and set the time limit to `-1`, then the acknowledged messages in all topics in the namespace are retained until the topic size reaches the size limit (10 GB). If you set a time limit of, for example, 1 day, and set the size limit to `-1`, then the acknowledged messages for all topics in the namespace are retained for 1 day.

The retention settings apply to all messages on topics that do not have any subscriptions, or if there are subscriptions, to messages that have been acked by all subscriptions. The retention policy settings do not affect unacknowledged messages on topics with subscriptions -- these are instead controlled by the backlog quota (see below).
When you set a size limit of, for example, 10 gigabytes, and set the time limit to `-1`, then the acknowledged messages in all topics in the namespace are retained until the topic size reaches the size limit(10 gigabytes). If you set a time limit of, for example, 1 day, and set the size limit to `-1`, then the acknowledged messages for all topics in the namespace are retained for 1 day.

The retention settings apply to all messages on topics that do not have any subscriptions, or to messages that have been acked by all subscriptions. The retention policy settings do not affect unacknowledged messages on topics with subscriptions. The unacknowledged messages are controlled by the backlog quota.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The retention settings apply to all messages on topics that do not have any subscriptions, or to messages that have been acked by all subscriptions. The retention policy settings do not affect unacknowledged messages on topics with subscriptions. The unacknowledged messages are controlled by the backlog quota.
The retention settings apply to all messages on topics that do not have any subscriptions, or to messages that have been acknowledged by all subscriptions. The retention policy settings do not affect unacknowledged messages on topics with subscriptions. The unacknowledged messages are controlled by the backlog quota.

### Defaults

There are two configuration parameters that you can use to set [instance](reference-terminology.md#instance)-wide defaults for message retention: [`defaultRetentionTimeInMinutes=0`](reference-configuration.md#broker-defaultRetentionTimeInMinutes) and [`defaultRetentionSizeInMB=0`](reference-configuration.md#broker-defaultRetentionSizeInMB).
You can use the following two configuration parameters to set instance-wide defaults for message retention: `defaultRetentionTimeInMinutes` and `defaultRetentionSizeInMB`. Both parameters are set to `0` by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change instance-wide to instance-level?

You can use the [`set-retention`](reference-pulsar-admin.md#namespaces-set-retention) subcommand and specify a namespace, a size limit using the `-s`/`--size` flag, and a time limit using the `-t`/`--time` flag.

To set a size limit of 10 gigabytes and a time limit of 3 hours for the `my-tenant/my-ns` namespace:
In the following example, size limit is set to 10 gigabytes and time limit is set to 3 hours for the `my-tenant/my-ns` namespace:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the following example, size limit is set to 10 gigabytes and time limit is set to 3 hours for the `my-tenant/my-ns` namespace:
In the following example, the size limit is set to 10 GB and time limit is set to 3 hours for the `my-tenant/my-ns` namespace:

```

To set retention where time limit is ignored and the size limit of 1 terabyte determines retention:
In the following example, time is not limited and size limit is set to 1 terabyte. The size limit determines the retention result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the following example, time is not limited and size limit is set to 1 terabyte. The size limit determines the retention result.
In the following example, the time is not limited while the size limit is set to 1 TB. The size limit determines the retention result.

```

To set retention where size limit is ignored and the time limit of 3 hours determines retention:
In the following example, size is not limited and time limit is set to 3 hours. The time limit determines the retention result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the following example, size is not limited and time limit is set to 3 hours. The time limit determines the retention result.
In the following example, the size is not limited and the time limit is set to 3 hours. The time limit determines the retention result.

Copy link

@ErikJansenIRefact ErikJansenIRefact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

In general good improvements in describing the behaviour of the retention settings.
I would suggest however that you describe the behaviour when both settings (time and size) have a value > 0. Are the messages now removed if one of the thresholds exceeds or only if both of them are exceeded. So are the settings a "logical and" or a "logical or"?

And is the 0-value treated as a special value? So, if size = 10Gb and time=0 are message now removed if size > 10Gb?
And if size = 10Gb and time = 1d are messages now removed if either the size or the time is exceeded?

@lhotari
Copy link
Member

lhotari commented Oct 28, 2020

And is the 0-value treated as a special value?

@ErikJansenIRefact
Yes, 0 is a special value. Setting either limit to 0 will disable retention. That's the catch.
PR #8358 introduced new behavior that prevents setting just one of the values to 0 via the Admin API. To disable the retention policy, both values have to be set to 0 when using the Admin API.
The "setting either limit to 0 will disable retention" behavior still applies to existing configurations (after upgrading to 2.7.0 when it's out) and also for defaultRetentionTimeInMinutes and defaultRetentionSizeInMB settings. I wonder how to get this documented?

@lhotari
Copy link
Member

lhotari commented Oct 28, 2020

I would suggest however that you describe the behaviour when both settings (time and size) have a value > 0. Are the messages now removed if one of the thresholds exceeds or only if both of them are exceeded. So are the settings a "logical and" or a "logical or"?

good points. explaining this would be useful. I believe it's described somewhere. I'm actually unsure of the answer off the top of my head...

@lhotari
Copy link
Member

lhotari commented Oct 28, 2020

The code tells the truth:

// skip ledger if retention constraint met
for (LedgerInfo ls : ledgers.headMap(slowestReaderLedgerId, false).values()) {
boolean expired = hasLedgerRetentionExpired(ls.getTimestamp());
boolean overRetentionQuota = isLedgerRetentionOverSizeQuota();
if (log.isDebugEnabled()) {
log.debug(
"[{}] Checking ledger {} -- time-old: {} sec -- "
+ "expired: {} -- over-quota: {} -- current-ledger: {}",
name, ls.getLedgerId(), (clock.millis() - ls.getTimestamp()) / 1000.0, expired,
overRetentionQuota, currentLedger.getId());
}
if (ls.getLedgerId() == currentLedger.getId()) {
log.debug("[{}] Ledger {} skipped for deletion as it is currently being written to", name,
ls.getLedgerId());
break;
} else if (expired) {
log.debug("[{}] Ledger {} has expired, ts {}", name, ls.getLedgerId(), ls.getTimestamp());
ledgersToDelete.add(ls);
} else if (overRetentionQuota) {
log.debug("[{}] Ledger {} is over quota", name, ls.getLedgerId());
ledgersToDelete.add(ls);
} else {
log.debug("[{}] Ledger {} not deleted. Neither expired nor over-quota", name, ls.getLedgerId());
break;
}
}

The retention happens only when both conditions are met.

@jennifer88huang please check if this is clearly documented.

@Jennifer88huang-zz
Copy link
Contributor Author

@ErikJansenIRefact

Hi,

In general good improvements in describing the behaviour of the retention settings.
I would suggest however that you describe the behaviour when both settings (time and size) have a value > 0. Are the messages now removed if one of the thresholds exceeds or only if both of them are exceeded. So are the settings a "logical and" or a "logical or"?

And is the 0-value treated as a special value? So, if size = 10Gb and time=0 are message now removed if size > 10Gb?
And if size = 10Gb and time = 1d are messages now removed if either the size or the time is exceeded?

Good question. it's "logical or". However, if you set either to 0, the retention is disabled.
@lhotari Thank you for helping answer the question, and provide the related doc.
I'll add more details here to make it clear.

@Jennifer88huang-zz
Copy link
Contributor Author

@ErikJansenIRefact I've listed the value settings in a table, is this way much clearer?

@Jennifer88huang-zz
Copy link
Contributor Author

@lhotari according to your comment,

  • in 2.6.x, the setting is sth like this
    image

  • in 2.7, the setting is sth like this.
    image

Do I grasp your meaning correctly? I've made adjustment in docs for both 2.7 and 2.6.1 accordingly (add the above table and examples). Any issue, feel free to comment.

@lhotari
Copy link
Member

lhotari commented Oct 29, 2020

Do I grasp your meaning correctly? I've made adjustment in docs for both 2.7 and 2.6.1 accordingly (add the above table and examples). Any issue, feel free to comment.

Yes, perfect. Good work @jennifer88huang

@lhotari
Copy link
Member

lhotari commented Oct 29, 2020

@jennifer88huang just one minor comment about "Messages are deleted when either time or size reaches the limit". The might lead to misunderstanding that deletion applies for all messages when someone looks at this table and doesn't read the other parts where it's described. I'm not sure how to refer that this only applies to the messages that have been acknowledged or don't have an active subscription at all.

@Jennifer88huang-zz
Copy link
Contributor Author

@lhotari great, thanks for your reminder. How about replacing "Messages are deleted when either time or size reaches the limit" with one of the followings

  • "Acknowledged messages or messages with no active subscription will be deleted"
  • "Acknowledged messages or messages with no active subscription will not be retained"

@lhotari
Copy link
Member

lhotari commented Oct 29, 2020

"Acknowledged messages or messages with no active subscription will not be retained"

+1

@Jennifer88huang-zz
Copy link
Contributor Author

@lhotari thanks for your quick response. Updated, PTAL again.

@lhotari
Copy link
Member

lhotari commented Oct 29, 2020

@jennifer88huang I guess the "when either..." part is needed too. Something like "Acknowledged messages or messages with no active subscription will not be retained when either time or size reaches the limit."?

@Jennifer88huang-zz
Copy link
Contributor Author

@lhotari oops, it's my mistake. I'm sorry I forget to add the condition...
Thank you very much for your reminder.

@lhotari
Copy link
Member

lhotari commented Oct 29, 2020

I read the document once more and it seems after all that it would be better to simplify the table. One suggestion is to replace "Acknowledged messages or messages with no active subscription will not be retained when either time or size reaches the limit." in the table with "Based on time and size limits" since it would be more consistent with the other rows in the same table.
WDYT?

@Jennifer88huang-zz
Copy link
Contributor Author

@lhotari Thanks for your feedback.
I'm not sure if we use "Based on time and size limits", whether users are clear about the meaning it conveys.
@ErikJansenIRefact Do you have any opinion on it? Which is more clear to you?

@Jennifer88huang-zz Jennifer88huang-zz merged commit 2d1b86f into apache:master Nov 4, 2020
flowchartsman pushed a commit to flowchartsman/pulsar that referenced this pull request Nov 17, 2020
* add retention policy

* fix the default vaule for retention

* add detailed retention info

* update

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Your PR contains doc changes, no matter whether the changes are in markdown or code files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Messages on topic are deleted on restart of standalone cluster

5 participants