From e6879d82a729fafa19f677b05f20d42186dab83f Mon Sep 17 00:00:00 2001 From: Khurram Baig Date: Thu, 28 Aug 2025 17:01:39 +0530 Subject: [PATCH] TEP-0157 Retention Policy for Tekton Results --- teps/0157-retention-policy-results.md | 286 ++++++++++++++++++++++++++ teps/README.md | 1 + 2 files changed, 287 insertions(+) create mode 100644 teps/0157-retention-policy-results.md diff --git a/teps/0157-retention-policy-results.md b/teps/0157-retention-policy-results.md new file mode 100644 index 000000000..8762e8e95 --- /dev/null +++ b/teps/0157-retention-policy-results.md @@ -0,0 +1,286 @@ +--- +status: proposed +title: Retention Policy for Tekton Results +creation-date: '2024-07-17' +last-updated: '2024-07-17' +authors: +- '@khrm' +collaborators: [] +--- + +# TEP-0157: Tekton Results: Retention Policy for older Results and Records + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Use Cases](#use-cases) + - [Requirements](#requirements) +- [Proposal](#proposal) + - [Notes and Caveats](#notes-and-caveats) +- [Design Details](#design-details) +- [Design Evaluation](#design-evaluation) + - [Reusability](#reusability) + - [Simplicity](#simplicity) + - [Flexibility](#flexibility) + - [User Experience](#user-experience) + - [Performance](#performance) + - [Risks and Mitigations](#risks-and-mitigations) + - [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Implementation Plan](#implementation-plan) + - [Test Plan](#test-plan) + - [Infrastructure Needed](#infrastructure-needed) + - [Upgrade and Migration Strategy](#upgrade-and-migration-strategy) + - [Implementation Pull Requests](#implementation-pull-requests) +- [References](#references) + + +## Summary +Tekton Results stores Pipelineruns, TaskRuns, Events and Logs indefinitely. +This proposed adding a retention policy feature for removing older Result and their associated records +alongwith request to delete logs. + +## Motivation +Storing older results and records indefinitely leads to wastage of storage resources +and degradation of DB performance. Sometime we don't require some pipelines to be deleted from +archives. + +### Goals +- Ability to define retention period for the Results at cluster level. All records and results past that period should be deleted. +- Ability to filter PipelineRuns when setting retention policy. +- A way to delete associated logs also from s3 buckets, gcs buckets or PVC. + +### Non-Goals + + + +### Use Cases +- User can specify a global policy for all the results. All records and logs falling under results satisfying pruning condition will be deleted. +- User can filter results based on cel expression and result Summary expression. All the associated records will be deleted. + +### Requirements +- For all results satisfying delete conditions, following things need to be deleted: +* Results +* Records for PipelineRun and TaskRun +* Records for EventLog +* Deletion of associated logs from s3 bucket, gcs bucket or PVC. EventLog Records should also be deleted. + +## Proposal +A pruner will run which will spin up job at specified interval based on configmap `config-results-retention-policy` given ttl and cel expressions. + +### Enhanced Policy-Based Retention + +To provide more granular control over data retention, we propose enhancing the retention policy mechanism. This new approach will allow users to define specific retention rules based on `PipelineRun` metadata, including labels, annotations, and completion status, while maintaining backward compatibility with the existing global retention setting. + +The core of this proposal is to introduce an optional list of ordered policies to the `config-results-retention-policy` ConfigMap. The retention job will evaluate these policies in order, and the first policy that matches a `PipelineRun` will determine its retention period. If no specific policy matches, a default retention period will be applied. + +This policy-based approach gives users the flexibility to, for example, retain successful production deployment `PipelineRuns` for a long time, while quickly pruning ephemeral builds from pull requests. + +### Notes and Caveats + + +## Design Details + +The `config-results-retention-policy` ConfigMap will be extended to support both the existing `defaultRetention` key for backward compatibility and a new `policies` key for granular control. + +The `defaultRetention` field will serve as the **fallback** retention period for any `PipelineRun` that does not match a rule in the `policies` list. This value does **not** override the retention period of a matching policy; it only applies when no policies match a given Result. + +The `policies` field will contain a YAML formatted string representing a list of rules. Each rule is evaluated in order, and the first match wins. A rule consists of: +- `name`: A descriptive name for the policy. +- `selector`: Defines the criteria for matching Results. All conditions within a selector are combined with an **AND** logic—a Result must meet all specified criteria (`matchNamespaces`, `matchLabels`, `matchAnnotations`, `matchStatuses`) for the policy to apply. If a particular selector type (e.g., `matchLabels`) is omitted from a policy, it will match all Results for that criterion. + - `matchNamespaces`: A list of namespaces to match against. A `PipelineRun` must be in one of the specified namespaces. An **OR** logic is applied to the values in the list. + - `matchLabels`: A map where keys are label names and values are a list of strings. A `PipelineRun` must have all the specified label keys, and for each key, its value must be in the provided list. An **OR** logic is applied to the values within a single key's list. + - `matchAnnotations`: A map where keys are annotation names and values are a list of strings. This works similarly to `matchLabels`. + - `matchStatuses`: A list of completion statuses to match against. A `PipelineRun`'s status must be in the list. An **OR** logic is applied to the values in the list. The status is determined by the `reason` field of the primary `Succeeded` condition in the `PipelineRun` or `TaskRun` status. For a list of possible status reasons, refer to the [Tekton documentation on execution status](https://tekton.dev/docs/pipelines/pipelineruns/#monitoring-execution-status). +- `retention`: The retention period for matching `PipelineRuns`, specified as a duration string (e.g., "730d", "90d", "24h"). + +#### Example Configuration: + +```yaml +# config/base/config-results-retention-policy.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: config-results-retention-policy +data: + # runAt determines when to run the pruning job. + runAt: "5 5 * * 0" + # defaultRetention is the fallback retention period. + # This is used if no specific policy matches. + defaultRetention: "30d" + # policies is an optional list of retention policies, evaluated in order. + policies: | + - name: "prod-namespace-deployments" + selector: + matchNamespaces: ["prod", "staging"] + matchStatuses: ["Succeeded"] + retention: "365d" + - name: "signed-prod-deployments" + selector: + matchNamespaces: ["prod"] + matchLabels: + 'tekton.dev/pipeline': ['deploy-to-prod'] + matchAnnotations: + 'chains.tekton.dev/signed': ['true'] + matchStatuses: ["Succeeded"] + retention: "730d" # 2 years + - name: "all-terminated-runs" + selector: + matchStatuses: ["Failed", "Cancelled", "PipelineRunTimeout"] + retention: "90d" # 90 days + - name: "git-event-builds" + selector: + matchLabels: + 'tekton.dev/event-type': ["pull_request", "push"] + retention: "14d" # 2 weeks +``` + +### Database Interaction + +No database schema changes are required. The retention job will leverage the existing `records` table, which stores `PipelineRun` data in a `jsonb` column and the namespace in the `parent` column. + +The job will dynamically construct a single SQL `DELETE` query with a `CASE` statement. This `CASE` statement will iterate through the configured policies and apply the appropriate retention period based on the first matching selector. The `jsonb` querying capabilities of PostgreSQL will be used to match the selectors against the `PipelineRun` metadata stored in the `data` column. + +The `ON DELETE CASCADE` foreign key constraint between the `results` and `records` tables ensures that deleting a `Result` will automatically delete all associated `Records`, including `PipelineRun` and `TaskRun` data. + + +## Design Evaluation + + +### Reusability + + + +### Simplicity + + + +### Flexibility + + + +### Conformance + + + +### User Experience + + + +### Performance +This improves the peformance of DB by deleting superfluous results and their associated datas. + +### Risks and Mitigations + + + +### Drawbacks + + + +## Alternatives + + +## Implementation Plan + + + + +### Test Plan + +- We will add a Integration tests like we have for Logging in GCS storage and other scenarios. + +### Infrastructure Needed + + + +### Upgrade and Migration Strategy + + + +### Implementation Pull Requests + + +## References + + diff --git a/teps/README.md b/teps/README.md index b8f511ecc..7ff27c1f3 100644 --- a/teps/README.md +++ b/teps/README.md @@ -146,6 +146,7 @@ This is the complete list of Tekton TEPs: |[TEP-0154](0154-concise-remote-resolver-syntax.md) | Concise Remote Resolver Syntax | implementable | 2024-03-21 | |[TEP-0155](0155-store-pipeline-events-in-db.md) | Store Pipeline Events in Tekton Results | proposed | 2024-04-19 | |[TEP-0156](0156-whenexpressions-in-step.md) | WhenExpressions in Steps | implemented | 2024-07-22 | +|[TEP-0157](0157-retention-policy-results.md) | Retention Policy for Tekton Results | proposed | 2024-07-17 | |[TEP-0160](0160-enhance-results-cli.md) | Enhance Tekton Results CLI | proposed | 2025-03-13 | |[TEP-0161](0161-resolver-caching.md) | Resolver Caching for Task and Pipeline Resolution | proposed | 2024-06-15 | |[TEP-0162](0162-event-based-pruning-of-tekton-resources.md) | event based pruning of tekton resources | proposed | 2025-06-18 |