Skip to content

[codex] Reuse legacy certificate detail cache#23

Merged
ethanolivertroy merged 1 commit into
mainfrom
codex/reuse-legacy-detail-cache
May 14, 2026
Merged

[codex] Reuse legacy certificate detail cache#23
ethanolivertroy merged 1 commit into
mainfrom
codex/reuse-legacy-detail-cache

Conversation

@ethanolivertroy
Copy link
Copy Markdown
Member

Summary

  • Treat cached certificate detail version fields as migratable instead of requiring them before reuse.
  • Add missing software_versions, hardware_versions, and firmware_versions keys when reusing older detail payloads.
  • Add regression coverage for legacy detail migration and cached provenance preservation.

Why

The post-merge update run 25846462127 timed out after 2h in Run scraper. Logs showed the scraper loaded 5,249 cached detail records but reused none: it completed 1,086 active records, then only reached 600/4,170 historical records before timeout. A live comparison showed the summary fingerprints still matched for most cached records, but old detail files lacked the newly introduced version keys, so the reuse guard forced a full live refetch.

With this patch, the current live summary pages are expected to reuse 1,079/1,086 active records and 4,139/4,170 historical records, leaving only new or changed certificates for live refresh.

Validation

  • git diff --check
  • venv/bin/python -m py_compile scraper.py test_scraper.py validate_api.py
  • venv/bin/python test_scraper.py
  • venv/bin/python validate_api.py
  • Targeted live reuse comparison against current NIST active and historical summary pages

@ethanolivertroy ethanolivertroy marked this pull request as ready for review May 14, 2026 09:05
@ethanolivertroy ethanolivertroy merged commit 5d37870 into main May 14, 2026
1 check passed
@ethanolivertroy ethanolivertroy deleted the codex/reuse-legacy-detail-cache branch May 14, 2026 09:05
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Reuse legacy certificate detail cache with field migration

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Treat cached certificate version fields as migratable instead of required
• Add missing version keys when reusing older detail payloads
• Add regression coverage for legacy detail migration
• Improve cache reuse efficiency for historical records
Diagram
flowchart LR
  A["Cached Detail Payload"] -->|Check fingerprint match| B["Fingerprint Matches"]
  B -->|Yes| C["Prepare Reused Payload"]
  C -->|Add migrated fields| D["Enhanced Detail Payload"]
  D -->|Return cached| E["Reused Record"]
  B -->|No| F["Fetch Fresh HTML"]
Loading

Grey Divider

File Changes

1. scraper.py ✨ Enhancement +9/-0

Separate required and migratable detail schema fields

• Split required detail schema fields from newly migratable version fields
• Add DETAIL_SCHEMA_MIGRATED_FIELDS tuple for software_versions, hardware_versions,
 firmware_versions
• Update DETAIL_SCHEMA_REQUIRED_FIELDS to exclude version fields
• Populate missing migrated fields with None in prepare_reused_detail_payload()

scraper.py


2. test_scraper.py 🧪 Tests +105/-11

Add legacy detail migration and reuse test coverage

• Rename test from test_should_reuse_certificate_detail_requires_version_schema_fields to
 test_reused_certificate_detail_migrates_version_schema_fields
• Change test assertion to verify legacy payloads without version fields are reusable
• Add prepare_reused_detail_payload to test imports
• Add new comprehensive test
 test_process_certificate_record_reuses_legacy_detail_with_migrated_fields validating end-to-end
 legacy detail reuse
• Update existing test_process_certificate_record_applies_cached_algorithm_provenance with
 complete detail payload structure

test_scraper.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 14, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Advisory comments

1. Schema lists duplicated 🐞 Bug ⚙ Maintainability
Description
DETAIL_SCHEMA_REQUIRED_FIELDS/DETAIL_SCHEMA_MIGRATED_FIELDS in scraper.py duplicate similar
required-field lists in validate_api.py, so future schema edits can easily drift and make
reuse/validation rules inconsistent. That inconsistency can silently reduce cache reuse or cause
validator failures depending on which list was updated.
Code

scraper.py[R116-125]

+    "standard",
+    "status",
+    "related_files",
+    "validation_history",
+    "vendor",
+)
+DETAIL_SCHEMA_MIGRATED_FIELDS = (
    "software_versions",
    "hardware_versions",
    "firmware_versions",
Evidence
The scraper defines required/migrated detail schema fields, while the API validator defines its own
required/current-schema detail field lists; these represent overlapping contracts but are maintained
independently, creating a concrete drift risk.

scraper.py[115-126]
validate_api.py[24-45]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scraper.py` introduces `DETAIL_SCHEMA_REQUIRED_FIELDS` and `DETAIL_SCHEMA_MIGRATED_FIELDS`, but `validate_api.py` separately maintains `DETAIL_REQUIRED_FIELDS` and `CURRENT_SCHEMA_DETAIL_FIELDS`. These lists represent the same conceptual contract (what a valid/detail payload must contain and what’s considered “current schema”), but they live in two places and can drift.

## Issue Context
- Scraper controls reuse gating and migration.
- Validator controls what output is acceptable.
- When these lists diverge, cache reuse decisions can become inconsistent with what validation expects.

## Fix Focus Areas
- scraper.py[115-126]
- validate_api.py[24-45]

## Suggested fix
- Create a small shared module (e.g., `schema_constants.py`) containing the shared field lists and import from both `scraper.py` and `validate_api.py`.
 - Alternatively, derive the required field list from `api/schemas/certificate-detail.schema.json` in one place (and share that derivation) if you want the schema JSON to be the single source of truth.
- Add a minimal unit test asserting the two lists stay aligned if you keep both for any reason.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant