Skip to content

Add per-sheet include/exclude filtering to encoder CLI/API with skip metadata#37

Merged
kingkillery merged 2 commits into
mainfrom
copilot/add-cli-flags-for-sheet-encoding
May 13, 2026
Merged

Add per-sheet include/exclude filtering to encoder CLI/API with skip metadata#37
kingkillery merged 2 commits into
mainfrom
copilot/add-cli-flags-for-sheet-encoding

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 13, 2026

Some multi-sheet workbooks mix many small sheets with one or more very large sheets, making all-sheet encoding unnecessarily expensive for common workflows. This change adds explicit sheet-selection controls so users can encode only relevant sheets and defer costly ones while preserving traceability in output metadata.

  • Sheet selection controls (CLI + Python API)

    • Added exact-name filters:
      • CLI: --include-sheet, --exclude-sheet (repeatable)
      • API: include_sheets, exclude_sheets
    • Added pattern-based filters:
      • CLI: --include-sheet-glob, --exclude-sheet-glob, --include-sheet-regex, --exclude-sheet-regex
      • API: include_sheet_globs, exclude_sheet_globs, include_sheet_regexes, exclude_sheet_regexes
    • Applied the same selection behavior to both compressed encoding and vanilla encoding paths.
  • Metadata for omitted sheets and reasons

    • Extended sheet_processing with selection metadata:
      • configured filters
      • included_sheets
      • skipped_sheets with per-sheet reason
    • Added explicit per-sheet skip records in sheet_processing.sheets[...] for:
      • include-filter mismatch
      • explicit exclude matches
      • empty sheets
      • existing bounded-mode skips
  • Downstream tolerance for deliberate sheet omission

    • Hardened QA evaluation path to avoid assuming at least one encoded sheet when selection filters omit sheets, preventing fallback failures in that scenario.
  • Coverage and docs

    • Added tests for:
      • encoding one named sheet from a multi-sheet workbook
      • excluding one sheet while encoding others
      • CLI filter behavior and skip-reason metadata
    • Updated README to document new filtering flags/parameters and resulting metadata shape.
    • Updated golden encoding snapshots to reflect new sheet_processing.selection content.
from Spreadsheet_LLM_Encoder import spreadsheet_llm_encode

encoding = spreadsheet_llm_encode(
    "workbook.xlsx",
    include_sheets=["Summary"],
    exclude_sheet_globs=["Archive*"],
    exclude_sheet_regexes=[r"^Raw_\\d{4}$"],
)
# encoding["sheet_processing"]["selection"]["included_sheets"]
# encoding["sheet_processing"]["selection"]["skipped_sheets"]

@kingkillery
Copy link
Copy Markdown
Owner

@copilot should we merge it?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 13, 2026

@copilot should we merge it?

I’d merge once the currently running workflow for this head commit finishes green. The code on 5a11ff9 has local full-suite validation (191 passed), and the recent completed workflow runs on this branch are successful.

Copilot AI requested a review from kingkillery May 13, 2026 07:14
@kingkillery kingkillery merged commit 6f7565d into main May 13, 2026
1 check passed
@kingkillery kingkillery deleted the copilot/add-cli-flags-for-sheet-encoding branch May 13, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants