Conversation
kassyray
left a comment
There was a problem hiding this comment.
I like the changes. I think they seem super useful.
Something that I had in a past version was counting the total number of PDFs in a batch and validating that against the number of clients that should be there.
Thoughts on whether to add this to validate_pdfs or elsewhere?
The JSON report is a nice addition. Either in the log (when implemented) or elsewhere, we can alert when something goes wrong on cli.
|
Super. I think should exist in the manifests created with the Example top of batch manifest below: {
"run_id": "20251030T135540",
"language": "fr",
"batch_type": "size_based",
"batch_identifier": null,
"batch_number": 1,
"total_batches": 1,
"batch_size": 100,
"total_clients": 5,
"total_pages": 15,
"sha256": "530f879d3186cd97b4ca5e25425ec8da63d59a1358c129951e114648d5e40989",
"output_pdf": "pdf_combined/fr_batch_001_of_001.pdf",
"clients": [
{
"sequence": "00001",
"client_id": "1009876545",
"full_name": "Scurry Nutcracker",
"school": "Burrow Public School",
"board": "",
"pdf_path": "pdf_individual/fr_notice_00001_1009876545.pdf",
"artifact_path": "artifacts/preprocessed_clients_20251030T135540.json",
"pages": 3
}, |
|
And the top of the validation output... {
"language": "fr",
"total_pdfs": 5,
"passed_count": 0,
"warning_count": 5,
"page_count_distribution": {
"3": 5
},
"warning_types": {
"exactly_two_pages": 5,
"signature_overflow": 5
},
"results": [
{
"filename": "fr_notice_00001_1009876545.pdf",
"page_count": 3,
"warnings": [
"exactly_two_pages: 3 pages (expected 2)",
"signature_overflow: Signature block found on page 2 (expected page 1)"
],
"passed": false
}, |
|
I added an example that uses measurement-based validation (contact info in envelope window), as well as regex-based validation (client id) I'm going to merge back into main PR. |
-Change pdf page counting to a more generic pdf validation step (retain pdf page counting check)
-Example of using invisible markers in typst to support more specific pdf validation
-Configuration of validation rules (disabled/warn/error) in parameters.yaml
-Documentation and test updates