Fix encoder runtime failures and CoS table selection flow by Copilot · Pull Request #31 · kingkillery/Spreadsheet_LLM_Encoder

Copilot · 2026-04-29T06:07:07Z

The encoder path described in the README had two critical runtime breaks: the main encoding pipeline called a missing aggregation function, and the Chain-of-Spreadsheet flow conflated table ranges with sheet names. This change restores the broken execution paths and tightens a couple of adjacent correctness issues surfaced by the same failure analysis.

Restore format-region aggregation in the encoder
- Implement aggregate_regions_dfs() and wire it into the existing encoding flow where it was already being called.
- Aggregate connected cells by semantic key into stable rectangular ranges, so spreadsheet_llm_encode() can complete instead of failing with NameError.
```
aggregated_formats = aggregate_regions_dfs(sheet, type_nfs_map)
numeric_ranges = aggregate_regions_dfs(sheet, numeric_map)
```
Fix CoS sheet/range handling
- Split sheet selection from range selection in the example pipeline.
- Keep identify_table() responsible for returning the detected table range, while find_relevant_sheet() resolves the sheet key used to read encoded data.
- Add a single-sheet fallback so the CoS path still runs when token matching cannot discriminate.
```
sheet_name = find_relevant_sheet(encoding, query)
table_range = identify_table(encoding, query)
sheet_data = encoding["sheets"][sheet_name]
```
Correct boundary candidate filtering
- Remove header-row exclusion from boundary candidate generation.
- This allows table candidates that include headers to survive downstream filtering, matching the intended heuristics and restoring expected anchor detection behavior.
Tighten spreadsheet dataset loading
- Ignore .xlsx/.json pairs that do not contain table annotations when building the table-detection dataset.
- Prevent QA annotation files from being misclassified as table-detection samples.

Agent-Logs-Url: https://github.com/kingkillery/Spreadsheet_LLM_Encoder/sessions/aae2984d-792f-44b3-ae90-25e2814a99fd Co-authored-by: kingkillery <200727508+kingkillery@users.noreply.github.com>

kingkillery · 2026-04-29T06:07:23Z

@copilot can we close any of the issues?

Copilot · 2026-04-29T06:08:09Z

@copilot can we close any of the issues?

Yes — #28 can be closed by this PR. The encoder crash from the missing aggregate_regions_dfs is fixed in this branch, and the current head is 3583c13. I would keep #30 open for now, since the placeholder/stub CoS and README-related items in that issue are not fully resolved by these commits.

kingkillery · 2026-04-29T06:17:46Z

Ok, please close #28 @copilot
That should do it.

Copilot · 2026-04-29T06:18:23Z

Ok, please close #28 @copilot
That should do it.

#28 is resolved by this PR and can be closed based on 3583c13. I don’t have an issue-close operation available from this session, but there are no UI changes in this PR.

Copilot AI and others added 4 commits April 29, 2026 04:07

Fix critical encoder and CoS runtime bugs

fb5741b

Agent-Logs-Url: https://github.com/kingkillery/Spreadsheet_LLM_Encoder/sessions/aae2984d-792f-44b3-ae90-25e2814a99fd Co-authored-by: kingkillery <200727508+kingkillery@users.noreply.github.com>

Fix boundary detection and dataset loader bugs

19cd774

Agent-Logs-Url: https://github.com/kingkillery/Spreadsheet_LLM_Encoder/sessions/aae2984d-792f-44b3-ae90-25e2814a99fd Co-authored-by: kingkillery <200727508+kingkillery@users.noreply.github.com>

Clean up CoS sheet selection API

ff72d74

Agent-Logs-Url: https://github.com/kingkillery/Spreadsheet_LLM_Encoder/sessions/aae2984d-792f-44b3-ae90-25e2814a99fd Co-authored-by: kingkillery <200727508+kingkillery@users.noreply.github.com>

Clarify CoS fallback documentation

3583c13

Agent-Logs-Url: https://github.com/kingkillery/Spreadsheet_LLM_Encoder/sessions/aae2984d-792f-44b3-ae90-25e2814a99fd Co-authored-by: kingkillery <200727508+kingkillery@users.noreply.github.com>

Copilot AI assigned Copilot and kingkillery Apr 29, 2026

Copilot created this pull request from a session on behalf of kingkillery April 29, 2026 06:07 View session

Copilot started work on behalf of kingkillery April 29, 2026 06:07 View session

Copilot finished work on behalf of kingkillery April 29, 2026 06:08

Copilot AI requested a review from kingkillery April 29, 2026 06:08

Copilot started work on behalf of kingkillery April 29, 2026 06:17 View session

kingkillery marked this pull request as ready for review April 29, 2026 06:19

kingkillery merged commit 8487420 into main Apr 29, 2026
1 of 2 checks passed

kingkillery deleted the copilot/validate-readme-claims branch April 29, 2026 06:19

Copilot AI restored the copilot/validate-readme-claims branch April 29, 2026 06:20

Copilot stopped work on behalf of kingkillery due to an error April 29, 2026 06:20
The session was cancelled by the user.

kingkillery mentioned this pull request Apr 29, 2026

【Critical Bug】Master branch is completely broken: NameError: name 'aggregate_regions_dfs' is not defined #28

Closed

kingkillery deleted the copilot/validate-readme-claims branch April 29, 2026 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix encoder runtime failures and CoS table selection flow#31

Fix encoder runtime failures and CoS table selection flow#31
kingkillery merged 4 commits into
mainfrom
copilot/validate-readme-claims

Copilot AI commented Apr 29, 2026

Uh oh!

kingkillery commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

kingkillery commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 29, 2026

Uh oh!

kingkillery commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

kingkillery commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants