Skip to content

Fix encoder runtime failures and CoS table selection flow#31

Merged
kingkillery merged 4 commits into
mainfrom
copilot/validate-readme-claims
Apr 29, 2026
Merged

Fix encoder runtime failures and CoS table selection flow#31
kingkillery merged 4 commits into
mainfrom
copilot/validate-readme-claims

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 29, 2026

The encoder path described in the README had two critical runtime breaks: the main encoding pipeline called a missing aggregation function, and the Chain-of-Spreadsheet flow conflated table ranges with sheet names. This change restores the broken execution paths and tightens a couple of adjacent correctness issues surfaced by the same failure analysis.

  • Restore format-region aggregation in the encoder

    • Implement aggregate_regions_dfs() and wire it into the existing encoding flow where it was already being called.
    • Aggregate connected cells by semantic key into stable rectangular ranges, so spreadsheet_llm_encode() can complete instead of failing with NameError.
    aggregated_formats = aggregate_regions_dfs(sheet, type_nfs_map)
    numeric_ranges = aggregate_regions_dfs(sheet, numeric_map)
  • Fix CoS sheet/range handling

    • Split sheet selection from range selection in the example pipeline.
    • Keep identify_table() responsible for returning the detected table range, while find_relevant_sheet() resolves the sheet key used to read encoded data.
    • Add a single-sheet fallback so the CoS path still runs when token matching cannot discriminate.
    sheet_name = find_relevant_sheet(encoding, query)
    table_range = identify_table(encoding, query)
    sheet_data = encoding["sheets"][sheet_name]
  • Correct boundary candidate filtering

    • Remove header-row exclusion from boundary candidate generation.
    • This allows table candidates that include headers to survive downstream filtering, matching the intended heuristics and restoring expected anchor detection behavior.
  • Tighten spreadsheet dataset loading

    • Ignore .xlsx/.json pairs that do not contain table annotations when building the table-detection dataset.
    • Prevent QA annotation files from being misclassified as table-detection samples.

@kingkillery
Copy link
Copy Markdown
Owner

@copilot can we close any of the issues?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 29, 2026

@copilot can we close any of the issues?

Yes — #28 can be closed by this PR. The encoder crash from the missing aggregate_regions_dfs is fixed in this branch, and the current head is 3583c13. I would keep #30 open for now, since the placeholder/stub CoS and README-related items in that issue are not fully resolved by these commits.

@kingkillery
Copy link
Copy Markdown
Owner

Ok, please close #28 @copilot
That should do it.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 29, 2026

Ok, please close #28 @copilot
That should do it.

#28 is resolved by this PR and can be closed based on 3583c13. I don’t have an issue-close operation available from this session, but there are no UI changes in this PR.

@kingkillery kingkillery marked this pull request as ready for review April 29, 2026 06:19
@kingkillery kingkillery merged commit 8487420 into main Apr 29, 2026
1 of 2 checks passed
@kingkillery kingkillery deleted the copilot/validate-readme-claims branch April 29, 2026 06:19
Copilot AI restored the copilot/validate-readme-claims branch April 29, 2026 06:20
Copilot stopped work on behalf of kingkillery due to an error April 29, 2026 06:20
@kingkillery kingkillery deleted the copilot/validate-readme-claims branch April 29, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants