Skip to content

[Repo Assist] Fix CSV schema parsing: column names containing parentheses corrupt type annotation#1604

Merged
dsyme merged 2 commits intomainfrom
repo-assist/fix-csv-schema-paren-946-6367eea86dd341c8
Feb 22, 2026
Merged

[Repo Assist] Fix CSV schema parsing: column names containing parentheses corrupt type annotation#1604
dsyme merged 2 commits intomainfrom
repo-assist/fix-csv-schema-paren-946-6367eea86dd341c8

Conversation

@github-actions
Copy link
Contributor

🤖 Repo Assist here — I'm an automated AI assistant for this repository.

Closes #946

Problem

When a CSV Schema parameter includes a column whose name contains parentheses, the type annotation is corrupted. For example:

// Schema="Na(  )me (int)"
// Expected: name = "Na(  )me", type = int
// Actual:   name = "Na",       type = " )me (int"  ← WRONG

This was reported in #946 in 2016 and confirmed as reproducible.

Root Cause

CsvInference.fs line 39 defined nameAndTypeRegex with RegexOptions.RightToLeft:

lazy Regex(@"^(?(name).+)\((?(type).+)\)$", RegexOptions.Compiled ||| RegexOptions.RightToLeft)

In RightToLeft mode, the regex engine starts from the right end of the string and scans leftward. When the column name contains a ( character, the greedy .+ for the type group sweeps past the intended (type) boundary and instead anchors the \( separator on the first ( in the string — giving a completely wrong parse.

Fix

Remove RegexOptions.RightToLeft from nameAndTypeRegex. Without it, the standard left-to-right greedy engine correctly backtracks the name .+ until the trailing (type) pattern matches — always landing on the last (...) at the end of the string, which is the intended semantics.

Note: overrideByNameRegex retains RightToLeft intentionally, as that pattern must find the last -> or = separator when the column name may contain those characters.

Test

Added a regression test Column name with parentheses is parsed correctly in schema to InferenceTests.fs.

Test Status

  • Build passes (dotnet build src/FSharp.Data.Csv.Core/)
  • FSharp.Data.DesignTime.Tests pass — 49/49 inference tests pass (includes new regression test)
  • FSharp.Data.Tests could not be built: the WorldBank type provider requires network access to api.worldbank.org, which is blocked by the build environment's network proxy. This is unrelated to this change.

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@4cb6855f0b3c0a719d7d5c3af44d1646450e63e9. View source at https://github.com/githubnext/agentics/tree/4cb6855f0b3c0a719d7d5c3af44d1646450e63e9/workflows/repo-assist.md.

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • api.worldbank.org

…pt type annotation

The nameAndTypeRegex used RegexOptions.RightToLeft, which caused incorrect
splits when a column name contained parentheses. For example, the schema
"Na(  )me (int)" was parsed as name="Na", type="  )me (int)" instead of
the correct name="Na(  )me", type="int".

Without RightToLeft, the greedy .+ quantifier on the name group naturally
backtracks and matches the *last* "(type)" group at the end of the string,
which is the correct behaviour.

Fixes #946.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review February 22, 2026 00:48
@dsyme dsyme enabled auto-merge February 22, 2026 00:48
@dsyme dsyme merged commit cadaa09 into main Feb 22, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/fix-csv-schema-paren-946-6367eea86dd341c8 branch February 22, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant