Skip to content

Fix wrong line number for assertions exceeding flex buffer size#8755

Open
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:fix-8257-long-lines
Open

Fix wrong line number for assertions exceeding flex buffer size#8755
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:fix-8257-long-lines

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

@tautschnig tautschnig commented Nov 30, 2025

When a preprocessed source line exceeds the flex input buffer size (YY_READ_BUF_SIZE, typically 8192 bytes), the scanner's YY_INPUT macro would report an incorrect (off-by-one) line number for tokens on that line. This is because YY_INPUT increments line_no when reading a newline character and then breaks, so the increment happens at the end of a line. For lines fitting in the buffer, the newline is read before flex matches tokens, so line_no is correct. For longer lines, the newline is read in a later YY_INPUT call, after flex has already matched tokens with a line_no that is one too low.

The first commit just fixes ansi-c, and the second commit generalises this to all flex-based scanners. We may choose to just use the approach from the first commit - review feedback on that would be appreciated.

Fixes: #8257

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Feb 24, 2026
When a preprocessed source line exceeds the flex input buffer size
(YY_READ_BUF_SIZE, typically 8192 bytes), the scanner's YY_INPUT macro
would report an incorrect (off-by-one) line number for tokens on that
line. This is because YY_INPUT increments line_no when reading a newline
character and then breaks, so the increment happens at the END of a
line. For lines fitting in the buffer, the newline is read before flex
matches tokens, so line_no is correct. For longer lines, the newline is
read in a later YY_INPUT call, after flex has already matched tokens
with a line_no that is one too low.

The fix overrides YY_INPUT in the ANSI-C scanner to defer the line
number increment: instead of incrementing immediately when reading a
newline, a flag is set, and the increment happens at the start of the
next YY_INPUT call. This ensures line_no is correct when flex matches
the first token on any line, regardless of line length.

All flex scanners in the codebase were analysed for this bug:

  Scanner        parser.h YY_INPUT  Reports lines  Affected
  ansi-c         yes (now overridden) yes           fixed here
  assembler      yes                  no            no
  crangler       no (own YY_INPUT)    no            no
  json           yes                  no            no
  statement-list yes                  yes           latent bug
  xmllang        yes                  no            no

The statement-list scanner has the same latent bug but is not fixed in
this commit because statement-list lines are unlikely to exceed 8192
bytes in practice.

Fixes: diffblue#8257

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the fix-8257-long-lines branch from 2718a72 to 86c0e4c Compare March 6, 2026 23:16
Move the deferred YY_INPUT line-number increment from the ANSI-C
scanner override into parser.h so that all flex-based scanners benefit
from correct line counting for lines exceeding the input buffer size.

Changes:
- parser.h: YY_INPUT now defers inc_line_no() to the start of the next
  call (via last_input_ended_with_newline flag); line_no initialised
  to 1 instead of 0; the line_no==0 -> 1 special case in
  source_location() is removed (no longer needed).
- scanner.l: remove the per-scanner YY_INPUT override added in the
  previous commit (now redundant).
- ansi_c_parser.h: remove the per-parser flag (now in parsert).
- ansi_c_language.cpp: reset line_no to 1 (not 0) between parse passes
  so that .i files with no newline report line 1.
- contracts_wrangler.cpp: drop redundant set_line_no(0) (constructor
  already initialises to 1).
- statement_list_language.cpp: drop set_line_no(0) (same reason).

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the fix-8257-long-lines branch from 86c0e4c to ed13b24 Compare March 6, 2026 23:22
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.01%. Comparing base (eaaf029) to head (ed13b24).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8755   +/-   ##
========================================
  Coverage    80.01%   80.01%           
========================================
  Files         1700     1700           
  Lines       188338   188335    -3     
  Branches        73       73           
========================================
+ Hits        150695   150696    +1     
+ Misses       37643    37639    -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig changed the title Fix incorrect line number reporting for long assertions in error messages Fix wrong line number for assertions exceeding flex buffer size Mar 8, 2026
@tautschnig tautschnig marked this pull request as ready for review March 8, 2026 20:18
Copilot AI review requested due to automatic review settings March 8, 2026 20:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes off-by-one source line reporting from flex-based scanners when token text spans multiple YY_INPUT reads (notably when a single preprocessed line exceeds YY_READ_BUF_SIZE), ensuring assertions and other diagnostics point to the correct line.

Changes:

  • Update parsert to use 1-based initial line numbering and simplify source_location() line assignment.
  • Modify the shared YY_INPUT macro to defer inc_line_no() until the start of the next YY_INPUT call (tracking newline endings via new parser state).
  • Add a CBMC regression test covering very long assertion lines; remove now-redundant set_line_no(0) initialisations.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/util/parser.h Core fix: 1-based line numbers + deferred newline line increment in shared YY_INPUT.
src/statement-list/statement_list_language.cpp Remove redundant set_line_no(0) now that parser starts at line 1.
src/goto-instrument/contracts/contracts_wrangler.cpp Remove redundant set_line_no(0) for the temporary ANSI-C parser.
src/ansi-c/ansi_c_language.cpp Reset parser line number to 1 before parsing the preprocessed translation unit.
regression/cbmc/long_assertion_line_number/* New regression test validating correct line numbers for very long assertion expressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tautschnig tautschnig assigned kroening and unassigned tautschnig Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CBMC 5.95.1 reporting wrong line number if assertion is long

4 participants