Fix wrong line number for assertions exceeding flex buffer size by tautschnig · Pull Request #8755 · diffblue/cbmc

tautschnig · 2025-11-30T03:06:43Z

When a preprocessed source line exceeds the flex input buffer size (YY_READ_BUF_SIZE, typically 8192 bytes), the scanner's YY_INPUT macro would report an incorrect (off-by-one) line number for tokens on that line. This is because YY_INPUT increments line_no when reading a newline character and then breaks, so the increment happens at the end of a line. For lines fitting in the buffer, the newline is read before flex matches tokens, so line_no is correct. For longer lines, the newline is read in a later YY_INPUT call, after flex has already matched tokens with a line_no that is one too low.

The first commit just fixes ansi-c, and the second commit generalises this to all flex-based scanners. We may choose to just use the approach from the first commit - review feedback on that would be appreciated.

Fixes: #8257

Each commit message has a non-empty body, explaining why the change was made.
n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

When a preprocessed source line exceeds the flex input buffer size (YY_READ_BUF_SIZE, typically 8192 bytes), the scanner's YY_INPUT macro would report an incorrect (off-by-one) line number for tokens on that line. This is because YY_INPUT increments line_no when reading a newline character and then breaks, so the increment happens at the END of a line. For lines fitting in the buffer, the newline is read before flex matches tokens, so line_no is correct. For longer lines, the newline is read in a later YY_INPUT call, after flex has already matched tokens with a line_no that is one too low. The fix overrides YY_INPUT in the ANSI-C scanner to defer the line number increment: instead of incrementing immediately when reading a newline, a flag is set, and the increment happens at the start of the next YY_INPUT call. This ensures line_no is correct when flex matches the first token on any line, regardless of line length. All flex scanners in the codebase were analysed for this bug: Scanner parser.h YY_INPUT Reports lines Affected ansi-c yes (now overridden) yes fixed here assembler yes no no crangler no (own YY_INPUT) no no json yes no no statement-list yes yes latent bug xmllang yes no no The statement-list scanner has the same latent bug but is not fixed in this commit because statement-list lines are unlikely to exceed 8192 bytes in practice. Fixes: diffblue#8257 Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Move the deferred YY_INPUT line-number increment from the ANSI-C scanner override into parser.h so that all flex-based scanners benefit from correct line counting for lines exceeding the input buffer size. Changes: - parser.h: YY_INPUT now defers inc_line_no() to the start of the next call (via last_input_ended_with_newline flag); line_no initialised to 1 instead of 0; the line_no==0 -> 1 special case in source_location() is removed (no longer needed). - scanner.l: remove the per-scanner YY_INPUT override added in the previous commit (now redundant). - ansi_c_parser.h: remove the per-parser flag (now in parsert). - ansi_c_language.cpp: reset line_no to 1 (not 0) between parse passes so that .i files with no newline report line 1. - contracts_wrangler.cpp: drop redundant set_line_no(0) (constructor already initialises to 1). - statement_list_language.cpp: drop set_line_no(0) (same reason). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

codecov · 2026-03-07T00:11:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.01%. Comparing base (eaaf029) to head (ed13b24).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8755   +/-   ##
========================================
  Coverage    80.01%   80.01%           
========================================
  Files         1700     1700           
  Lines       188338   188335    -3     
  Branches        73       73           
========================================
+ Hits        150695   150696    +1     
+ Misses       37643    37639    -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR fixes off-by-one source line reporting from flex-based scanners when token text spans multiple YY_INPUT reads (notably when a single preprocessed line exceeds YY_READ_BUF_SIZE), ensuring assertions and other diagnostics point to the correct line.

Changes:

Update parsert to use 1-based initial line numbering and simplify source_location() line assignment.
Modify the shared YY_INPUT macro to defer inc_line_no() until the start of the next YY_INPUT call (tracking newline endings via new parser state).
Add a CBMC regression test covering very long assertion lines; remove now-redundant set_line_no(0) initialisations.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/util/parser.h	Core fix: 1-based line numbers + deferred newline line increment in shared `YY_INPUT`.
src/statement-list/statement_list_language.cpp	Remove redundant `set_line_no(0)` now that parser starts at line 1.
src/goto-instrument/contracts/contracts_wrangler.cpp	Remove redundant `set_line_no(0)` for the temporary ANSI-C parser.
src/ansi-c/ansi_c_language.cpp	Reset parser line number to 1 before parsing the preprocessed translation unit.
regression/cbmc/long_assertion_line_number/*	New regression test validating correct line numbers for very long assertion expressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tautschnig self-assigned this Feb 24, 2026

tautschnig force-pushed the fix-8257-long-lines branch from 2718a72 to 86c0e4c Compare March 6, 2026 23:16

tautschnig force-pushed the fix-8257-long-lines branch from 86c0e4c to ed13b24 Compare March 6, 2026 23:22

tautschnig changed the title ~~Fix incorrect line number reporting for long assertions in error messages~~ Fix wrong line number for assertions exceeding flex buffer size Mar 8, 2026

tautschnig marked this pull request as ready for review March 8, 2026 20:18

Copilot AI review requested due to automatic review settings March 8, 2026 20:18

tautschnig requested review from feliperodri, kroening, peterschrammel and remi-delmas-3000 as code owners March 8, 2026 20:18

Copilot started reviewing on behalf of tautschnig March 8, 2026 20:19 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

tautschnig assigned kroening and unassigned tautschnig Mar 8, 2026

feliperodri approved these changes Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong line number for assertions exceeding flex buffer size#8755

Fix wrong line number for assertions exceeding flex buffer size#8755
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:fix-8257-long-lines

tautschnig commented Nov 30, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tautschnig commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 7, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tautschnig commented Nov 30, 2025 •

edited

Loading