Fix wrong line number for assertions exceeding flex buffer size#8755
Open
tautschnig wants to merge 2 commits intodiffblue:developfrom
Open
Fix wrong line number for assertions exceeding flex buffer size#8755tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig wants to merge 2 commits intodiffblue:developfrom
Conversation
When a preprocessed source line exceeds the flex input buffer size (YY_READ_BUF_SIZE, typically 8192 bytes), the scanner's YY_INPUT macro would report an incorrect (off-by-one) line number for tokens on that line. This is because YY_INPUT increments line_no when reading a newline character and then breaks, so the increment happens at the END of a line. For lines fitting in the buffer, the newline is read before flex matches tokens, so line_no is correct. For longer lines, the newline is read in a later YY_INPUT call, after flex has already matched tokens with a line_no that is one too low. The fix overrides YY_INPUT in the ANSI-C scanner to defer the line number increment: instead of incrementing immediately when reading a newline, a flag is set, and the increment happens at the start of the next YY_INPUT call. This ensures line_no is correct when flex matches the first token on any line, regardless of line length. All flex scanners in the codebase were analysed for this bug: Scanner parser.h YY_INPUT Reports lines Affected ansi-c yes (now overridden) yes fixed here assembler yes no no crangler no (own YY_INPUT) no no json yes no no statement-list yes yes latent bug xmllang yes no no The statement-list scanner has the same latent bug but is not fixed in this commit because statement-list lines are unlikely to exceed 8192 bytes in practice. Fixes: diffblue#8257 Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
2718a72 to
86c0e4c
Compare
Move the deferred YY_INPUT line-number increment from the ANSI-C scanner override into parser.h so that all flex-based scanners benefit from correct line counting for lines exceeding the input buffer size. Changes: - parser.h: YY_INPUT now defers inc_line_no() to the start of the next call (via last_input_ended_with_newline flag); line_no initialised to 1 instead of 0; the line_no==0 -> 1 special case in source_location() is removed (no longer needed). - scanner.l: remove the per-scanner YY_INPUT override added in the previous commit (now redundant). - ansi_c_parser.h: remove the per-parser flag (now in parsert). - ansi_c_language.cpp: reset line_no to 1 (not 0) between parse passes so that .i files with no newline report line 1. - contracts_wrangler.cpp: drop redundant set_line_no(0) (constructor already initialises to 1). - statement_list_language.cpp: drop set_line_no(0) (same reason). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
86c0e4c to
ed13b24
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8755 +/- ##
========================================
Coverage 80.01% 80.01%
========================================
Files 1700 1700
Lines 188338 188335 -3
Branches 73 73
========================================
+ Hits 150695 150696 +1
+ Misses 37643 37639 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes off-by-one source line reporting from flex-based scanners when token text spans multiple YY_INPUT reads (notably when a single preprocessed line exceeds YY_READ_BUF_SIZE), ensuring assertions and other diagnostics point to the correct line.
Changes:
- Update
parsertto use 1-based initial line numbering and simplifysource_location()line assignment. - Modify the shared
YY_INPUTmacro to deferinc_line_no()until the start of the nextYY_INPUTcall (tracking newline endings via new parser state). - Add a CBMC regression test covering very long assertion lines; remove now-redundant
set_line_no(0)initialisations.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/parser.h | Core fix: 1-based line numbers + deferred newline line increment in shared YY_INPUT. |
| src/statement-list/statement_list_language.cpp | Remove redundant set_line_no(0) now that parser starts at line 1. |
| src/goto-instrument/contracts/contracts_wrangler.cpp | Remove redundant set_line_no(0) for the temporary ANSI-C parser. |
| src/ansi-c/ansi_c_language.cpp | Reset parser line number to 1 before parsing the preprocessed translation unit. |
| regression/cbmc/long_assertion_line_number/* | New regression test validating correct line numbers for very long assertion expressions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
feliperodri
approved these changes
Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a preprocessed source line exceeds the flex input buffer size (
YY_READ_BUF_SIZE, typically 8192 bytes), the scanner'sYY_INPUTmacro would report an incorrect (off-by-one) line number for tokens on that line. This is becauseYY_INPUTincrements line_no when reading a newline character and then breaks, so the increment happens at the end of a line. For lines fitting in the buffer, the newline is read before flex matches tokens, soline_nois correct. For longer lines, the newline is read in a laterYY_INPUTcall, after flex has already matched tokens with aline_nothat is one too low.The first commit just fixes
ansi-c, and the second commit generalises this to all flex-based scanners. We may choose to just use the approach from the first commit - review feedback on that would be appreciated.Fixes: #8257