From 416260a9bf9e94e75b40408ce57ce2eca2897484 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Sat, 11 Apr 2026 14:03:03 +0000 Subject: [PATCH 01/12] =?UTF-8?q?Iteration=20172:=20Add=20na=5Fops=20?= =?UTF-8?q?=E2=80=94=20isna/notna/ffill/bfill?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implements pandas missing-value utilities as standalone exported functions: - `isna` / `notna` / `isnull` / `notnull` — detect missing values in scalars, Series, and DataFrames (mirrors pd.isna / pd.notna) - `ffillSeries` / `bfillSeries` — forward/backward fill for Series with optional `limit` parameter - `dataFrameFfill` / `dataFrameBfill` — column-wise or row-wise fill for DataFrames with optional `limit` and `axis` parameters Metric: 28 → 29 pandas_features_ported Run: https://github.com/githubnext/tsessebe/actions/runs/24263385922 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- playground/index.html | 35 +++ playground/na_ops.html | 480 +++++++++++++++++++++++++++++++++++++ src/index.ts | 42 ++++ src/stats/index.ts | 40 ++++ src/stats/na_ops.ts | 336 ++++++++++++++++++++++++++ tests/stats/na_ops.test.ts | 280 ++++++++++++++++++++++ 6 files changed, 1213 insertions(+) create mode 100644 playground/na_ops.html create mode 100644 src/stats/na_ops.ts create mode 100644 tests/stats/na_ops.test.ts diff --git a/playground/index.html b/playground/index.html index 48bfbcb9..fe626424 100644 --- a/playground/index.html +++ b/playground/index.html @@ -254,6 +254,11 @@
Detect and fill missing values. isna(), notna(), isnull(), notnull() for scalars/Series/DataFrame. ffillSeries(), bfillSeries(), dataFrameFfill(), dataFrameBfill() with optional limit and axis support.
+Count unique values. valueCounts() for Series and dataFrameValueCounts() for DataFrame with normalize, sort, ascending, and dropna options.
@@ -264,6 +269,36 @@Fractional change between elements. pctChangeSeries() and pctChangeDataFrame() with periods, fillMethod (pad/bfill), limit, and axis options.
+Return the index label of the minimum or maximum value. idxminSeries(), idxmaxSeries(), idxminDataFrame(), idxmaxDataFrame() with skipna support.
+Cast Series and DataFrame values to a different dtype. astypeSeries(), astype() with per-column mapping support and integer clamping.
+Substitute values in Series and DataFrame. Supports scalar, array (many→one, pair-wise), Record, and Map replacement specs.
+Conditional value selection. where keeps values where the condition is true; mask replaces them. Supports boolean arrays, Series, DataFrame, and callable conditions.
Discrete difference and value shifting for Series and DataFrame. diff computes element-wise differences; shift lags or leads values by a number of periods. Essential for time-series analysis.
+ isna / notna — detect missing values in scalars,
+ Series, and DataFrames.
+ ffill / bfill — propagate the last (or next) valid
+ value to fill gaps.
+ Mirrors pd.isna(), Series.ffill(), and
+ DataFrame.bfill() from pandas.
+
isna / notna on scalars
+ Returns true / false for individual values.
+ null, undefined, and NaN are all
+ considered "missing".
+
isna on a Series
+ When passed a Series, isna returns a boolean Series of the
+ same length — true where values are missing.
+
isna on a DataFrame
+ Returns a DataFrame of booleans with the same shape — one column per
+ original column, true where missing.
+
ffillSeries)
+ Propagates the last valid value forward to fill gaps. Leading
+ nulls that have no preceding value remain null.
+ Use the optional limit to cap consecutive fills.
+
bfillSeries)
+ Propagates the next valid value backward to fill gaps. Trailing
+ nulls that have no following value remain null.
+
+ dataFrameFfill and dataFrameBfill apply fill
+ column-wise by default (axis=0). Pass axis: 1 to fill
+ row-wise across columns.
+
// Module-level missing-value detection
+isna(value: Scalar): boolean
+isna(value: Series): Series<boolean>
+isna(value: DataFrame): DataFrame
+
+notna(value: Scalar): boolean
+notna(value: Series): Series<boolean>
+notna(value: DataFrame): DataFrame
+
+// Aliases
+isnull(...) // same as isna
+notnull(...) // same as notna
+
+// Series forward / backward fill
+ffillSeries(series, options?: { limit?: number | null }): Series
+bfillSeries(series, options?: { limit?: number | null }): Series
+
+// DataFrame forward / backward fill
+dataFrameFfill(df, options?: {
+ limit?: number | null, // max consecutive fills (default: no limit)
+ axis?: 0 | 1 | "index" | "columns", // default 0 (column-wise)
+}): DataFrame
+
+dataFrameBfill(df, options?: {
+ limit?: number | null,
+ axis?: 0 | 1 | "index" | "columns",
+}): DataFrame
+ Compute the fractional change between each element and a prior element.
+ Mirrors pandas.Series.pct_change() /
+ pandas.DataFrame.pct_change().
+ Edit any code block below and press ▶ Run
+ (or Ctrl+Enter) to execute it live in your browser.
+
pctChangeSeries(series) returns the fractional (not percentage) change
+ from each previous element. The first element is always null.
The periods option controls the lag. Use periods: 2 to
+ compare each value to the one two steps earlier — useful for month-over-month
+ comparisons in quarterly data.
By default, pctChangeSeries forward-fills (fillMethod: "pad")
+ NaN/null values before computing the ratio — so gaps don't break the chain.
+ Set fillMethod: null to propagate NaN instead.
The limit option caps how many consecutive NaN values get forward-filled.
+ Useful when you want to tolerate short gaps but not bridge large ones.
pctChangeDataFrame(df) applies pctChangeSeries to every
+ column independently. Ideal for comparing multiple assets or metrics simultaneously.
A negative periods value computes the forward change: how much will
+ this element change by the time we reach |periods| steps ahead.
+ Useful for computing returns on a "hold for N periods" strategy.
All functions return a new Series/DataFrame of the same shape — inputs are never mutated.
+// Series
+pctChangeSeries(series, {
+ periods?: number, // default 1 (positive = look back, negative = look forward)
+ fillMethod?: "pad" | "bfill" | null, // default "pad"
+ limit?: number | null, // max consecutive fills; default unlimited
+}): Series
+
+// DataFrame
+pctChangeDataFrame(df, {
+ periods?: number,
+ fillMethod?: "pad" | "bfill" | null,
+ limit?: number | null,
+ axis?: 0 | 1 | "index" | "columns", // default 0 (column-wise)
+}): DataFrame
+
+ Return the index label of the minimum or maximum value in a
+ Series or each column of a DataFrame.
+ Mirrors pandas.Series.idxmin(), idxmax(),
+ pandas.DataFrame.idxmin(), and DataFrame.idxmax().
+
Returns the index label at the position of the minimum value. + NaN / null values are skipped by default.
+Returns the index label at the position of the maximum value.
+By default NaN / null values are skipped. Set skipna: false
+ to propagate NaN (returns null if any value is NaN).
Returns a Series indexed by column names. Each value is the row label + where that column achieves its minimum.
+Returns a Series indexed by column names, where each entry is the row + label of that column's maximum value.
+Behavior for empty series, series where every value is NaN, and series + where all values are equal.
+// Series
+idxminSeries(series, { skipna?: boolean }): Label // default skipna=true
+idxmaxSeries(series, { skipna?: boolean }): Label
+
+// DataFrame (axis=0 — min/max per column)
+idxminDataFrame(df, { skipna?: boolean }): Series // indexed by column names
+idxmaxDataFrame(df, { skipna?: boolean }): Series
+ Loading tsb runtime…
+
+ Cast Series and DataFrame values to a different dtype.
+ Mirrors pandas.Series.astype and pandas.DataFrame.astype.
+
+ Cast floating-point values to integers via truncation (same as
+ pandas.Series.astype("int64")).
+
Convert every value to its string representation. Null/undefined values
+ become null (not the string "null").
+ Values that overflow the target integer dtype's range are clamped to
+ [min, max] — e.g. uint8 is clamped to
+ [0, 255].
+
Pass a single dtype name to cast every column to the same type.
+Pass a Record<string, DtypeName> to cast individual
+ columns. Columns not listed are carried over unchanged.
Zero, empty string, and NaN become false;
+ everything else (including non-zero numbers and non-empty strings)
+ becomes true.
// Series cast
+astypeSeries(
+ series: Series,
+ dtype: DtypeName | Dtype,
+ options?: AstypeOptions,
+): Series
+
+// DataFrame cast (all columns or per-column mapping)
+astype(
+ df: DataFrame,
+ dtype: DtypeName | Dtype | Record<string, DtypeName | Dtype>,
+ options?: DataFrameAstypeOptions,
+): DataFrame
+
+// Low-level scalar cast
+castScalar(value: Scalar, dtype: Dtype): Scalar
+
+// Options
+interface AstypeOptions {
+ errors?: "raise" | "ignore"; // default "raise"
+}
+
+// Supported dtype names
+type DtypeName =
+ | "int8" | "int16" | "int32" | "int64"
+ | "uint8" | "uint16" | "uint32" | "uint64"
+ | "float32" | "float64"
+ | "bool" | "string" | "object"
+ | "datetime" | "timedelta" | "category"
+
+ replaceSeries / replaceDataFrame substitute values
+ matching a pattern with a new value.
+ Supports scalar, array, and mapping (Record / Map) replacement specs.
+ Mirrors Series.replace() and DataFrame.replace() from pandas.
+
+ Replace every occurrence of a single value with another value.
+ Works on numbers, strings, booleans, and null.
+
+ Replace a list of values with a single target, or perform pair-wise + replacement using two equal-length arrays. +
+
+ Pass a lookup table as either a plain object (Record<string, Scalar>)
+ or a JavaScript Map for full type flexibility.
+
+ replaceDataFrame applies the same spec to all columns by
+ default. Use the columns option to restrict which columns
+ are affected.
+
// Replace values in a Series
+replaceSeries(
+ series: Series,
+ spec: ReplaceSpec,
+ options?: ReplaceOptions,
+): Series
+
+// Replace values in a DataFrame
+replaceDataFrame(
+ df: DataFrame,
+ spec: ReplaceSpec,
+ options?: DataFrameReplaceOptions,
+): DataFrame
+
+// Replacement spec variants
+type ReplaceSpec =
+ | { toReplace: Scalar; value: Scalar } // scalar → scalar
+ | { toReplace: Scalar[]; value: Scalar } // array → scalar
+ | { toReplace: Scalar[]; value: Scalar[] } // array → array (pair-wise)
+ | { toReplace: Record<string, Scalar> } // Record mapping
+ | { toReplace: Map<Scalar, Scalar> } // Map mapping
+
+// Options
+interface ReplaceOptions {
+ matchNaN?: boolean; // treat NaN===NaN for matching (default: true)
+}
+
+interface DataFrameReplaceOptions extends ReplaceOptions {
+ columns?: string[]; // only replace in these columns (default: all)
+}
+ where / mask
+ Conditional value selection: keep or replace elements based on a boolean
+ condition. These are the TypeScript equivalents of
+ pandas.Series.where / pandas.DataFrame.where and
+ pandas.Series.mask / pandas.DataFrame.mask.
+
// where: keep where cond=true, replace with `other` where cond=false
+whereSeries(s, cond, { other: null })
+
+// mask: replace where cond=true with `other`, keep where cond=false
+maskSeries(s, cond, { other: null })
+
+ s.where(cond, other=np.nan)s.mask(cond, other=np.nan)
+ const s = new Series({ data: [10, 20, 30, 40, 50], name: "prices" });
+whereSeries(s, [true, false, true, false, true]);
+// → [10, null, 30, null, 50]
+
+
+ const s = new Series({ data: [1, 2, 3, 4, 5] });
+// Replace values > 3 with -1
+maskSeries(s, (v) => v > 3, { other: -1 });
+// → [1, 2, 3, -1, -1]
+
+
+ const df = DataFrame.fromColumns({
+ a: [1, 2, 3],
+ b: [4, 5, 6],
+});
+const cond = [[true, false], [false, true], [true, true]];
+whereDataFrame(df, cond);
+// a: [1, null, 3]
+// b: [null, 5, 6]
+
+
+ const df = DataFrame.fromColumns({
+ a: [1, 2, 3],
+ b: [10, 20, 30],
+});
+// Keep rows 0 and 2 only, replace row 1 across all columns
+const rowCond = new Series({ data: [true, false, true], index: [0, 1, 2] });
+whereDataFrame(df, rowCond, { axis: 0, other: 0 });
+// a: [1, 0, 3]
+// b: [10, 0, 30]
+
+
+ const df = DataFrame.fromColumns({ a: [1, 2, 3], b: [4, 5, 6] });
+const condDf = DataFrame.fromColumns({
+ a: [false, true, false],
+ b: [true, false, true],
+});
+maskDataFrame(df, condDf, { other: 99 });
+// a: [1, 99, 3]
+// b: [99, 5, 99]
+
+
+
+ diffSeries / diffDataFrame compute the element-wise discrete
+ difference (value[i] - value[i-periods]).
+ shiftSeries / shiftDataFrame shift values forward or backward
+ by a given number of periods, filling with a configurable value.
+ Mirrors Series.diff(), Series.shift(),
+ DataFrame.diff(), and DataFrame.shift() from pandas.
+
+ Compute s[i] - s[i - periods] for each position.
+ The first periods entries are null.
+ Non-numeric values produce null.
+
💡 Tip: diffSeries is commonly used to compute returns, velocity, or changes over time.
+ Shift values forward (positive periods) or backward (negative periods).
+ Vacated positions are filled with fillValue (default null).
+
💡 Tip: combine shiftSeries with arithmetic to compute returns, lags, or leads.
+ axis=0 (default): diff each column independently (rows over time).
+ axis=1: diff across columns within each row.
+
+ Shift all columns by the same number of periods. + Useful for creating lagged features in machine learning. +
+💡 Tip: creating multiple lagged columns is a common feature-engineering technique for time series forecasting.
+// Discrete difference
+diffSeries(series: Series<Scalar>, options?: DiffOptions): Series<Scalar>
+diffDataFrame(df: DataFrame, options?: DataFrameDiffOptions): DataFrame
+
+interface DiffOptions {
+ periods?: number; // default 1; negative = look forward
+}
+interface DataFrameDiffOptions extends DiffOptions {
+ axis?: 0 | 1 | "index" | "columns"; // default 0
+}
+
+// Value shifting
+shiftSeries(series: Series<Scalar>, options?: ShiftOptions): Series<Scalar>
+shiftDataFrame(df: DataFrame, options?: DataFrameShiftOptions): DataFrame
+
+interface ShiftOptions {
+ periods?: number; // default 1; negative = shift backward
+ fillValue?: Scalar; // default null
+}
+interface DataFrameShiftOptions extends ShiftOptions {
+ axis?: 0 | 1 | "index" | "columns"; // default 0
+}
+ duplicated / drop_duplicates
+ Detect and remove duplicate values or rows.
+ duplicatedSeries / duplicatedDataFrame return a boolean
+ Series marking which items are duplicates.
+ dropDuplicatesSeries / dropDuplicatesDataFrame return
+ a new object with duplicates removed.
+
// keep="first" (default): mark later duplicates as true
+duplicatedSeries(s)
+
+// keep="last": mark earlier duplicates as true
+duplicatedSeries(s, { keep: "last" })
+
+// keep=false: mark ALL occurrences of any duplicate
+duplicatedSeries(s, { keep: false })
+
+ s.duplicated(keep='first')df.duplicated(subset=['a', 'b'], keep='first')s.drop_duplicates() / df.drop_duplicates()
+ const s = new Series({ data: [1, 2, 1, 3, 2] });
+duplicatedSeries(s).values;
+// → [false, false, true, false, true]
+
+
+ const s = new Series({ data: ["a", "b", "a", "c", "b"] });
+duplicatedSeries(s, { keep: false }).values;
+// → [true, true, true, false, true]
+
+
+ const s = new Series({ data: [10, 20, 10, 30, 20], name: "prices" });
+dropDuplicatesSeries(s).values;
+// → [10, 20, 30]
+
+
+ const df = DataFrame.fromRecords([
+ { name: "Alice", dept: "Eng" },
+ { name: "Bob", dept: "Eng" },
+ { name: "Alice", dept: "HR" },
+ { name: "Bob", dept: "Eng" }, // ← duplicate of row 1 on "name"+"dept"
+]);
+// Only consider "name" column for duplicates:
+duplicatedDataFrame(df, { subset: ["name"] }).values;
+// → [false, false, true, true] (Alice and Bob each appear twice)
+
+
+ const df = DataFrame.fromRecords([
+ { a: 1, b: 2 },
+ { a: 1, b: 2 },
+ { a: 3, b: 4 },
+ { a: 3, b: 4 },
+]);
+const deduped = dropDuplicatesDataFrame(df);
+// shape: [2, 2]
+// a: [1, 3] b: [2, 4]
+
+
+ Detect and remove duplicate values or rows. Supports keep="first", keep="last", and keep=false (mark all occurrences). DataFrame supports a subset of columns.
Random sampling from Series and DataFrame. Supports fixed count, fractional sampling, with/without replacement, weighted sampling, and seeded deterministic results via randomState.
sample
+ Randomly sample items from a Series or rows/columns from a DataFrame.
+ Supports fixed count (n), fractional sampling (frac),
+ sampling with replacement (replace), weighted sampling, and
+ deterministic seeding via randomState.
+
// Sample 3 items (without replacement by default)
+sampleSeries(s, { n: 3 })
+
+// Sample 50% of rows
+sampleDataFrame(df, { frac: 0.5 })
+
+// Reproducible sample with seed
+sampleSeries(s, { n: 2, randomState: 42 })
+
+// Sample with replacement (bootstrap)
+sampleSeries(s, { n: 10, replace: true })
+
+// Sample columns instead of rows
+sampleDataFrame(df, { n: 2, axis: 1 })
+
+ s.sample(n=3, random_state=42)df.sample(frac=0.5, replace=False, axis=0)
+ const s = new Series({ data: [10, 20, 30, 40, 50], name: "scores" });
+sampleSeries(s, { n: 3, randomState: 7 }).values;
+// deterministic result with seed 7
+
+
+ const s = new Series({ data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] });
+sampleSeries(s, { frac: 0.3, randomState: 42 }).values;
+// 30% of 10 items = 3 items
+
+
+ const s = new Series({ data: ["a", "b", "c"] });
+// Sample more items than pool size — only possible with replace=true
+sampleSeries(s, { n: 7, replace: true, randomState: 0 }).values;
+
+
+ const s = new Series({ data: ["rare", "common", "very_common"] });
+// "very_common" has 10× the weight of "rare"
+sampleSeries(s, { n: 1, weights: [1, 5, 10], randomState: 3 }).values;
+// most likely: ["very_common"]
+
+
+ const df = DataFrame.fromRecords([
+ { city: "NYC", pop: 8_336_817 },
+ { city: "LA", pop: 3_979_576 },
+ { city: "Chicago",pop: 2_693_976 },
+ { city: "Houston",pop: 2_320_268 },
+ { city: "Phoenix",pop: 1_680_992 },
+]);
+const sample = sampleDataFrame(df, { n: 3, randomState: 1 });
+sample.col("city").values;
+
+
+