Skip to content

Extract strGetDummies into a dedicated module#227

Open
Copilot wants to merge 3 commits intomainfrom
copilot/autoloopport-str-get-dummies
Open

Extract strGetDummies into a dedicated module#227
Copilot wants to merge 3 commits intomainfrom
copilot/autoloopport-str-get-dummies

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

Autoloop iteration 281 intended to add strGetDummies (port of pandas.Series.str.get_dummies(sep)) as a fresh standalone function, but the symbol already lived inside the catch-all src/stats/string_ops.ts. The original auto-generated patch couldn't be applied (duplicate export, plus it used a non-existent positional new Series([...]) constructor). This PR delivers the iteration's intent: promote strGetDummies to its own module and tighten its semantics to the spec.

Implementation

  • New src/stats/str_get_dummies.ts. Splits each Series element by sep (default "|"), returns a DataFrame of 0/1 indicator columns sorted lexicographically, preserves the input index.
  • null / undefined / NaN (and "") now produce all-zero rows instead of falling through String(NaN) and synthesizing a stray "NaN" column.
  • Columns are assembled via a Map, not a plain object, so integer-like tokens (e.g. "0") keep their lexicographic position rather than being hoisted to the front by JS object key ordering.
  • All-null input still threads series.index through DataFrame.fromColumns, preserving the row count when there are zero columns.

Wiring

  • Removed the duplicate strGetDummies and StrGetDummiesOptions from src/stats/string_ops.ts (and the now-unused DataFrame import / docstring entry).
  • Re-export from src/stats/index.ts and src/index.ts, including the previously-missing StrGetDummiesOptions type.

Tests & docs

  • Relocated the existing test block from tests/stats/string_ops.test.ts into a dedicated tests/stats/str_get_dummies.test.ts, expanded with explicit null / undefined / NaN / empty-string / index-preservation / integer-token cases, pandas-parity examples, and 5 fast-check properties (row count, 0-or-1 cells, index preservation, lexicographic columns, missing rows have row-sum 0).
  • New playground/str_get_dummies.html with 4 runnable examples.

Behavioral diff

const s = new Series({ data: ["a|b", null, NaN, "b"] });
strGetDummies(s).toRecords();
// before: [{a:1,b:1,NaN:0}, {a:0,b:0,NaN:0}, {a:0,b:0,NaN:1}, {a:0,b:1,NaN:0}]
// after:  [{a:1,b:1},       {a:0,b:0},       {a:0,b:0},       {a:0,b:1}]

const s2 = new Series({ data: [" |0"] });
strGetDummies(s2).columns.values;
// before: ["0", " "]   ← Object key reorder
// after:  [" ", "0"]   ← lexicographic

Copilot AI changed the title [WIP] Add strGetDummies function for series in migration Extract strGetDummies into a dedicated module Apr 25, 2026
Copilot AI requested a review from mrjf April 25, 2026 15:10
Copilot finished work on behalf of mrjf April 25, 2026 15:10
@mrjf mrjf marked this pull request as ready for review April 25, 2026 18:53
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Commit pushed: 917c7e6

Generated by Evergreen — PR Health Keeper

@github-actions
Copy link
Copy Markdown
Contributor

Evergreen fix applied

The CI failure was a Biome formatter error in tests/stats/str_get_dummies.test.ts. Three multi-line expressions were formatted in a way that differed from Biome's expected output (wrapping was not applied consistently). Applied biome format --write to fix the formatting.

  • Root cause: format error — Biome expected collapsed expressions on single lines
  • Fix: Auto-formatted tests/stats/str_get_dummies.test.ts
  • Result: Lint now passes with 0 errors (537 warnings only — all pre-existing)

Generated by Evergreen — PR Health Keeper · ● 1.3M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Autoloop] [Autoloop: build-tsb-pandas-typescript-migration]

2 participants