Skip to content

[FEATURE] Add configurable retention policies for log entries (filesystem and database storage) #34

@guibranco

Description

@guibranco

Introduce configurable retention policies to automatically delete log entries when they meet defined criteria. This feature should support both storage backends currently available in the server:

  • filesystem storage (structured .jsonl files)
  • database storage (currently MySQL / MariaDB)

Retention policies should be flexible and allow filtering by multiple dimensions so administrators can control storage growth and comply with data lifecycle requirements. 🧹📦

Motivation

As log volume increases over time, storage usage grows quickly. Currently, there is no automatic cleanup mechanism. This can lead to:

  • excessive disk usage
  • reduced performance over time
  • compliance concerns (e.g., personal data retention)
  • operational maintenance overhead

Adding retention policies enables automated lifecycle management and makes the server suitable for long-running production environments.

Proposed retention criteria

Retention policies should support filtering by one or more of the following fields:

  • application key (app_key)

  • application id / environment (app_id)

  • timestamp / date range

  • log level (level)

  • category (category)

  • message content pattern

    • glob support
    • regex support
  • optional context filters (future extension)

Retention rules may include combinations of these filters.

Example ideas:

  • delete logs older than 30 days
  • delete debug level logs after 7 days
  • keep error logs for 180 days
  • delete logs matching a specific regex pattern
  • delete logs from a specific application id

Storage-specific implementation considerations

Filesystem storage

Current structure:

storage/logs/{year}/{month}/{day}.jsonl

Example:

storage/logs/2026/04/17.jsonl

Retention implementation strategies could include:

  • file-level pruning when retention applies to full files
  • entry-level pruning when filters are partial
  • periodic cleanup worker / scheduled task
  • optional compaction rewrite when removing matching entries

Because filesystem storage is append-only JSONL-based, partial filtering may require file rewriting.

Recommendation: add filesystem indexing support

To improve retention performance for filesystem storage, introduce an optional lightweight indexing layer.

Possible indexing strategies:

  • per-day metadata index
  • per-app_key index
  • per-level index
  • timestamp range index
  • category index

Example approaches:

  • SQLite index database alongside filesystem logs
  • structured metadata sidecar files
  • in-memory index with persisted snapshots

Benefits:

  • faster retention evaluation
  • faster filtering and searching
  • reduced IO during cleanup jobs
  • improved scalability for large installations 🚀

This indexing system could be optional and enabled through configuration.

Database storage

For MySQL / MariaDB storage:

Retention policies can be implemented using:

  • scheduled cleanup jobs
  • indexed delete queries
  • partition-based cleanup (future improvement)
  • TTL-style pruning strategy

Retention execution should remain consistent with filesystem behavior whenever possible.

Example log file content

Example .jsonl file currently stored by the filesystem backend:

{"id":"RH2B5EPK10Y25T5RW100000000","trace_id":"436c9c52-5270-43c5-aae4-8f7ea298e4f9","batch_id":null,"app_key":"test-app","app_id":"production","user_agent":"PostmanRuntime/7.53.0","level":"info","category":"manual-test","message":"Hallo from Postman! Manuel_Stark98@hotmail.com","context":{"sent_by":"postman","note":"first test message"},"timestamp":"2026-04-17T16:45:44.888+00:00","created_at":"2026-04-17T16:45:44.888+00:00"}
{"id":"X5EB5EPK10JFKPH0M600000000","trace_id":"56bb020c-0190-493f-be69-f5be52c6a05e","batch_id":null,"app_key":"test-app","app_id":"production","user_agent":"PostmanRuntime/7.53.0","level":"info","category":"manual-test","message":"Hallo from Postman! Hans_Adams@hotmail.com","context":{"sent_by":"postman","note":"first test message"},"timestamp":"2026-04-17T16:45:56.797+00:00","created_at":"2026-04-17T16:45:56.797+00:00"}
{"id":"4VKB5EPK10BJBQ517600000000","trace_id":"c835dad9-9e43-4dba-a54a-86a35d72f1aa","batch_id":null,"app_key":"test-app","app_id":"production","user_agent":"PostmanRuntime/7.53.0","level":"info","category":"manual-test","message":"Hallo from Postman! Julia_Effertz@gmail.com","context":{"sent_by":"postman","note":"first test message"},"timestamp":"2026-04-17T16:46:02.596+00:00","created_at":"2026-04-17T16:46:02.596+00:00"}
{"id":"GP0C5EPK10CSCE8HEF00000000","trace_id":"7458a8a4-739f-448c-bf6c-2eddf9efd2a5","batch_id":null,"app_key":"test-app","app_id":"production","user_agent":"PostmanRuntime/7.53.0","level":"info","category":"manual-test","message":"Hallo from Postman! Michel_Schowalter@hotmail.com","context":{"sent_by":"postman","note":"first test message"},"timestamp":"2026-04-17T16:46:15.760+00:00","created_at":"2026-04-17T16:46:15.760+00:00"}

Retention logic should support filtering based on fields present in entries like these.

Suggested configuration approach

Possible configuration structure example:

retention:
  enabled: true
  policies:
    - name: remove-old-debug
      level: debug
      older_than_days: 7

    - name: remove-production-old
      app_id: production
      older_than_days: 30

    - name: remove-pattern
      message_regex: ".*@hotmail\\.com"
      older_than_days: 1

Configuration format may evolve depending on implementation decisions.

Acceptance criteria

  • retention policies can be configured via server configuration
  • retention policies support filesystem storage backend
  • retention policies support database storage backend
  • retention filtering supports timestamp-based cleanup
  • retention filtering supports app_key filtering
  • retention filtering supports app_id filtering
  • retention filtering supports category filtering
  • retention filtering supports level filtering
  • retention filtering supports regex message filtering
  • retention filtering supports glob message filtering
  • cleanup job runs automatically on schedule
  • cleanup job can run manually via command or endpoint
  • filesystem cleanup supports partial-file pruning when required
  • indexing recommendation evaluated for filesystem optimization
  • retention execution is documented in project documentation
  • configuration examples added to documentation
  • unit tests added for retention logic
  • integration tests added for both storage providers

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationTestsenhancementNew feature or requestgood first issueGood for newcomershacktoberfestParticipation in the Hacktoberfest eventhelp wantedExtra attention is needed✨ featureNew feature requests or implementations🎲 databaseDatabase-related operations👷🏼 infrastructureInfrastructure-related tasks or issues📊 dashboardFeatures or changes related to UI dashboards and data displays📝 documentationTasks related to writing or updating documentation🕕 very high effortA task that can be completed in a few weeks🚀 performancePerformance optimizations or regressions

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions