Skip to content

Implement header fallback for HTML tables#55

Merged
leynos merged 2 commits intomainfrom
codex/add-last-resort-behavior-for-tables
Jul 13, 2025
Merged

Implement header fallback for HTML tables#55
leynos merged 2 commits intomainfrom
codex/add-last-resort-behavior-for-tables

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jul 13, 2025

Summary

  • assume the first row is a header when no header markup is found
  • adjust test expectations
  • document the fallback header behaviour

Testing

  • cargo clippy -- -D warnings
  • RUSTFLAGS="-D warnings" cargo test
  • markdownlint README.md docs/*.md
  • nixie README.md docs/*.md

https://chatgpt.com/codex/tasks/task_e_6873a8cc9208832293665fc8491f7b2a

Summary by Sourcery

Implement fallback header detection for HTML table conversion when no header tags are present.

Enhancements:

  • Treat the first row of a multi-row HTML table as the header if no or strong/b tags are found.

Documentation:

  • Document the fallback header behavior in html-table-support.md.

Tests:

  • Update integration test to expect a separator line for tables without explicit headers.

If no header markup is detected the converter now treats the first row as the header. Updated docs and tests.
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jul 13, 2025

Reviewer's Guide

Adds logic to fallback to using the first row as a header when no explicit header markup is found, and updates the documentation and tests to reflect this new behavior.

Class diagram for table header detection logic

classDiagram
    class TableConverter {
        +table_node_to_markdown(table: Handle) Vec<String>
        -first_header: bool
        -row_handles: Vec<Handle>
        -col_count: usize
    }
    TableConverter : +Detects header row by checking for <th>, <strong>, <b> in first row
    TableConverter : +If no header markup and multiple rows, fallback to first row as header
Loading

Flow diagram for table header fallback logic

flowchart TD
    A[Start table conversion] --> B{First row contains header markup?}
    B -- Yes --> C[Use first row as header]
    B -- No --> D{Table has multiple rows?}
    D -- Yes --> E[Fallback: Use first row as header]
    D -- No --> F[No header row]
    C --> G[Insert separator line in Markdown]
    E --> G
    F --> H[No separator line]
    G --> I[Continue conversion]
    H --> I
Loading

File-Level Changes

Change Details Files
Implement header fallback logic for tables without explicit header markup
  • Detect tables with multiple rows lacking
, , or markers
  • Set first_header flag when no header markup is present and row count > 1
  • src/html.rs
    Document fallback header behavior in HTML table support guide
    • Describe detection of header markers and last-resort first-row fallback
    • Explain insertion of Markdown separator line for readability
    docs/html-table-support.md
    Adjust integration test to expect header separator in no-header tables
    • Update expected vector to include separator line for header fallback
    tests/integration.rs

    Tips and commands

    Interacting with Sourcery

    • Trigger a new review: Comment @sourcery-ai review on the pull request.
    • Continue discussions: Reply directly to Sourcery's review comments.
    • Generate a GitHub issue from a review comment: Ask Sourcery to create an
      issue from a review comment by replying to it. You can also reply to a
      review comment with @sourcery-ai issue to create an issue from it.
    • Generate a pull request title: Write @sourcery-ai anywhere in the pull
      request title to generate a title at any time. You can also comment
      @sourcery-ai title on the pull request to (re-)generate the title at any time.
    • Generate a pull request summary: Write @sourcery-ai summary anywhere in
      the pull request body to generate a PR summary at any time exactly where you
      want it. You can also comment @sourcery-ai summary on the pull request to
      (re-)generate the summary at any time.
    • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
      request to (re-)generate the reviewer's guide at any time.
    • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
      pull request to resolve all Sourcery comments. Useful if you've already
      addressed all the comments and don't want to see them anymore.
    • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
      request to dismiss all existing Sourcery reviews. Especially useful if you
      want to start fresh with a new review - don't forget to comment
      @sourcery-ai review to trigger a new review!

    Customizing Your Experience

    Access your dashboard to:

    • Enable or disable review features such as the Sourcery-generated pull request
      summary, the reviewer's guide, and others.
    • Change the review language.
    • Add, remove or edit custom review instructions.
    • Adjust other review settings.

    Getting Help

    @coderabbitai
    Copy link
    Copy Markdown
    Contributor

    coderabbitai Bot commented Jul 13, 2025

    Warning

    Rate limit exceeded

    @leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 55 seconds before requesting another review.

    ⌛ How to resolve this issue?

    After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

    We recommend that you space out your commits to avoid hitting the rate limit.

    🚦 How do rate limits work?

    CodeRabbit enforces hourly rate limits for each developer per organization.

    Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

    Please see our FAQ for further information.

    📥 Commits

    Reviewing files that changed from the base of the PR and between a88c1ff and a2fc033.

    📒 Files selected for processing (3)
    • docs/html-table-support.md (1 hunks)
    • src/html.rs (2 hunks)
    • tests/integration.rs (2 hunks)

    Summary by CodeRabbit

    • Documentation

      • Clarified how the HTML-to-Markdown table converter determines and handles table headers, including fallback behaviour for simple tables without explicit header markup.
    • Bug Fixes

      • Improved handling of tables without explicit headers by treating the first row as a header when multiple rows are present, ensuring proper Markdown formatting.
    • Tests

      • Updated tests to reflect the addition of a Markdown header separator line for tables lacking explicit headers.

    Walkthrough

    Extend the HTML-to-Markdown table converter to always treat the first row as a header if no explicit header markup is found and the table has multiple rows. Update documentation to clarify header detection logic and modify the relevant test to expect a Markdown header separator line in such cases.

    Changes

    File(s) Change Summary
    docs/html-table-support.md Clarified documentation on how header rows are detected in HTML-to-Markdown table conversion.
    src/html.rs Modified table_node_to_markdown to treat the first row as a header if no explicit header is present.
    tests/integration.rs Updated test expectation to include a Markdown header separator line for tables without explicit headers.

    Sequence Diagram(s)

    sequenceDiagram
        participant User
        participant Converter
        participant Markdown
    
        User->>Converter: Provide HTML table
        Converter->>Converter: Inspect first row for <th> or strong formatting
        alt Header detected
            Converter->>Markdown: Output header row and separator
        else No header detected, multiple rows
            Converter->>Markdown: Treat first row as header, insert separator
        end
        Converter->>Markdown: Output remaining rows as data
    
    Loading

    Possibly related PRs

    Poem

    Tables in Markdown, neat and bright,
    Now gain their headers, set just right.
    If HTML forgets to say,
    The first row leads the way—
    With dashes drawn beneath the head,
    Your data’s clear, just as it’s said!
    📝✨

    ✨ Finishing Touches
    • 📝 Generate Docstrings
    🧪 Generate unit tests
    • Create PR with unit tests
    • Post copyable unit tests in a comment
    • Commit unit tests in branch codex/add-last-resort-behavior-for-tables

    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Explain this complex logic.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai explain this code block.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and explain its main purpose.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Support

    Need help? Create a ticket on our support page for assistance with any issues or questions.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR.
    • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
    • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link
    Copy Markdown
    Contributor

    @sourcery-ai sourcery-ai Bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Hey @leynos - I've reviewed your changes and found some issues that need to be addressed.

    • Consider refactoring the header detection and fallback logic so that you determine whether there’s a header (explicit or fallback) before you render any rows, which would make the code flow more straightforward.
    • The first_header variable is really a boolean flag—renaming it to something like has_header or splitting it into explicit vs fallback flags could make the intent clearer and avoid confusion.
    Prompt for AI Agents
    Please address the comments from this code review:
    ## Overall Comments
    - Consider refactoring the header detection and fallback logic so that you determine whether there’s a header (explicit or fallback) before you render any rows, which would make the code flow more straightforward.
    - The `first_header` variable is really a boolean flag—renaming it to something like `has_header` or splitting it into explicit vs fallback flags could make the intent clearer and avoid confusion.
    
    ## Individual Comments
    
    ### Comment 1
    <location> `tests/integration.rs:460` </location>
    <code_context>
     #[test]
     fn test_convert_html_table_no_header() {
    -    let expected = vec!["| A | B |", "| 1 | 2 |"];
    +    let expected = vec!["| A | B |", "| --- | --- |", "| 1 | 2 |"];
         assert_eq!(convert_html_tables(&html_table_no_header()), expected);
     }
    </code_context>
    
    <issue_to_address>
    Consider adding tests for edge cases where the first row is empty or contains only whitespace.
    
    Please also add tests for cases where the first row is empty, contains only whitespace, or has inconsistent cell counts to ensure the fallback logic handles these scenarios correctly.
    </issue_to_address>

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

    Comment thread tests/integration.rs
    Copy link
    Copy Markdown
    Contributor

    @coderabbitai coderabbitai Bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 1

    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: ASSERTIVE
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 662e512 and a88c1ff.

    📒 Files selected for processing (3)
    • docs/html-table-support.md (1 hunks)
    • src/html.rs (1 hunks)
    • tests/integration.rs (1 hunks)
    🧰 Additional context used
    📓 Path-based instructions (2)
    **/*.md

    Instructions used from:

    Sources:
    ⚙️ CodeRabbit Configuration File

    **/*.rs

    Instructions used from:

    Sources:
    ⚙️ CodeRabbit Configuration File

    🪛 LanguageTool
    docs/html-table-support.md

    [uncategorized] ~23-~23: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
    Context: ...first row is still treated as the header so the Markdown output includes a separato...

    (COMMA_COMPOUND_SENTENCE_2)

    ⏰ Context from checks skipped due to timeout of 240000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
    • GitHub Check: build-test
    🔇 Additional comments (2)
    src/html.rs (1)

    148-151: Approve the fallback header logic implementation.

    The conditional logic correctly implements the fallback behaviour: treating the first row as a header when no explicit header markup exists and the table has multiple rows. The edge case handling for single-row tables is appropriate.

    tests/integration.rs (1)

    460-460: Approve the test expectation update.

    The updated expected output correctly reflects the fallback header behaviour for tables without explicit header markup, ensuring the test validates the new implementation.

    Comment thread docs/html-table-support.md Outdated
    @leynos leynos merged commit e22e748 into main Jul 13, 2025
    2 checks passed
    @leynos leynos deleted the codex/add-last-resort-behavior-for-tables branch July 13, 2025 13:11
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant