Conversation
Added benchmark pairs for: dataframe_shift_diff, dataframe_pow_mod, clip_series_bounds, reindex, dataframe_compare, series_add_sub_mul_div, numeric_ops_math, dataframe_add_sub_mul_div. Run: https://github.com/githubnext/tsessebe/actions/runs/24535650224 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- bench_series_any_all: anySeries / allSeries boolean reductions - bench_dataframe_any_all: anyDataFrame / allDataFrame boolean reductions - bench_dataframe_nunique: nuniqueDataFrame per-column unique counts - bench_series_crosstab: seriesCrosstab two-series cross-tabulation - bench_bdate_range: bdate_range business-day DatetimeIndex generation - bench_series_radd_rsub: seriesRadd / seriesRsub / seriesRmul / seriesRdiv reverse arithmetic - bench_dataframe_radd_rsub: dataFrameRadd / dataFrameRsub / dataFrameRmul / dataFrameRdiv reverse arithmetic - bench_series_exp_log: seriesExp / seriesLog2 / seriesLog10 / seriesSign extended math Run: https://github.com/githubnext/tsessebe/actions/runs/24536797293 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added pairs: infer_dtype, value_counts_binned, categorical_index, tz_localize_convert, align_series, align_dataframe, memory_usage, named_agg. Covers dtype inference, binned value counts, CategoricalIndex ops, timezone operations, Series/DataFrame alignment, memory estimation, and named aggregation (lost in iter 133's missing branch). Run: https://github.com/githubnext/tsessebe/actions/runs/24537885791 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 137 — ✅ Accepted | Run
|
Added benchmarks for: - series_ceil_floor_trunc_sqrt: seriesCeil/seriesFloor/seriesTrunc/seriesSqrt vs numpy - dataframe_ceil_floor_trunc: dataFrameCeil/Floor/Trunc/Sqrt vs numpy on DataFrame - dataframe_exp_log: dataFrameExp/Log/Log2/Log10 vs numpy on DataFrame - pivot_table_full: pivotTableFull (with margins) vs pd.pivot_table - read_excel: readExcel/xlsxSheetNames with 10k-row XLSX vs pd.read_excel/openpyxl - pipe_chain_ops: pipeChain/pipeTo/dataFramePipeChain/dataFramePipeTo vs .pipe() - nan_extended_agg: nancount/nanmedian/nanprod vs Series.count/median/prod - series_pipe_apply: pipeSeries/dataFramePipe vs Series.pipe/DataFrame.pipe Run: https://github.com/githubnext/tsessebe/actions/runs/24538933188 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Autoloop Iteration 138 — ✅ Accepted
New pairs added:
|
Added benchmark pairs: - cut_interval_index: cutIntervalIndex/qcutIntervalIndex vs pd.cut/qcut - dataframe_sign: dataFrameSign vs np.sign(df) - argsort_scalars: argsortScalars/searchsortedMany vs np.argsort/searchsorted - interval_index_ops: IntervalIndex.contains/get_loc vs pd.IntervalIndex ops - period_index_range: PeriodIndex.periodRange/fromPeriods vs pd.period_range - datetime_index_from: DatetimeIndex.fromDates/fromTimestamps vs pd.DatetimeIndex - timedelta_index: TimedeltaIndex.fromTimedeltas/fromRange/fromStrings vs pd.TimedeltaIndex - resolve_freq: resolveFreq vs pd.tseries.frequencies.to_offset Run: https://github.com/githubnext/tsessebe/actions/runs/24539911725 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 139 — ✅ Accepted | metric: 420 (+8 from 412) | commit: 18753f6 New benchmark pairs added (8):
Previous best: 412 | New best: 420 | Delta: +8 |
Added 9 new benchmark pairs: - groupby_multi_key: DataFrameGroupBy with multi-column keys ["dept","region"] vs pandas multi-key groupby - timestamp_static: Timestamp.fromComponents/fromisoformat/fromtimestamp vs pd.Timestamp static ctors - tz_datetime_index_ops: TZDatetimeIndex.toLocalStrings/sort/unique/filter/contains vs tz-aware DatetimeIndex ops - rolling_center_min_periods: Rolling with center=true and minPeriods options vs pandas rolling center/min_periods - cast_scalar: castScalar type coercion vs Python int()/float()/str() conversions - concat_options: concat with join="inner" and ignoreIndex=true vs pd.concat join/ignore_index - ewm_com_halflife: EWM with com and halflife params vs pandas ewm(com/halflife) - nat_sort_key: natSortKey tokenizer vs Python regex-based natural sort key - dataframe_iter: DataFrame.items()/iterrows() vs pandas df.items()/iterrows() Note: State file claimed best was 428 (from iters 140/141 that were not pushed to branch); actual branch had 420 pairs. This iteration rebuilds to 429 (new actual best). Run: https://github.com/githubnext/tsessebe/actions/runs/24545567127 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 142 — ✅ Accepted
New benchmark pairs:
|
Add standalone functional-form benchmarks and new operation benchmarks: - bench_quantile_fn: quantileSeries/quantileDataFrame standalone functions - bench_pct_change_fn: pctChangeSeries/pctChangeDataFrame standalone functions - bench_merge_suffixes: merge with custom suffixes option - bench_expanding_min_periods: Expanding with minPeriods option - bench_dt_isocalendar: DatetimeAccessor.isocalendar_week - bench_period_asfreq: Period.asfreq/PeriodIndex.asfreq frequency conversion - bench_sample_fn: sampleSeries/sampleDataFrame standalone functions - bench_nunique_fn: nuniqueSeries/nuniqueDataFrame standalone functions Run: https://github.com/githubnext/tsessebe/actions/runs/24547746540 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 143 — ✅ Accepted
|
Added pairs: period_arithmetic (Period.add/diff/compareTo/contains), period_index_methods (PeriodIndex.shift/sort/unique/toDatetimeStart/toDatetimeEnd), dt_total_seconds (DatetimeAccessor.total_seconds), timedelta_index_ops (TimedeltaIndex.sort/unique/shift/filter/min/max), interval_overlaps (Interval.overlaps/IntervalIndex.overlaps), describe_opts (describe with percentiles/include options), merge_index_join (merge with left_index/right_index), to_json_orient (toJson with records/split/columns/values orient options). Run: https://github.com/githubnext/tsessebe/actions/runs/24549838166 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added benchmarks for standalone functional forms not yet covered: - mode_dataframe_fn: modeDataFrame standalone vs pandas df.mode() - where_mask_series_fn: whereSeries/maskSeries standalone vs pandas - where_mask_df_fn: whereDataFrame/maskDataFrame standalone vs pandas - idxmin_max_df: idxminDataFrame/idxmaxDataFrame vs pandas df.idxmin/idxmax - interpolate_fn: interpolateSeries/dataFrameInterpolate standalone vs pandas - explode_fn: explodeSeries/explodeDataFrame standalone vs pandas - fillna_fn: fillnaSeries/fillnaDataFrame standalone vs pandas - dropna_fn: dropnaSeries/dropnaDataFrame standalone vs pandas - diff_applymap_fn: diffSeries/applymap standalone vs pandas Run: https://github.com/githubnext/tsessebe/actions/runs/24551622461 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Autoloop Iteration 145 — AcceptedMetric: 445 → 454 (+9) What changedAdded 9 new benchmark pairs (TS + Python) covering standalone functional forms that were exported from
Run: §24551622461 |
Added 8 new benchmark pairs covering: - timestamp_arith: Timestamp.add/sub/eq/lt/gt/le/ge/ne operations - timestamp_str_format: strftime/isoformat/day_name/month_name - timestamp_round_normalize: ceil/floor/round/normalize - value_counts_opts: valueCounts with normalize/ascending/dropna options - series_sortvalues_opts: Series.sortValues with ascending=false/naPosition='first' - dataframe_sortvalues_mixed: DataFrame.sortValues with mixed ascending array - series_groupby_size: SeriesGroupBy.size() and getGroup() - series_log_natural: seriesLog (natural logarithm) Run: https://github.com/githubnext/tsessebe/actions/runs/24555921452 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 147 — ✅ Accepted
|
Add benchmarks for standalone comparison, floordiv/mod/pow, drop-duplicates, nsmallest, and duplicated functions not yet covered as standalone imports: - series_standalone_compare: seriesEq/Ne/Lt/Gt/Le/Ge - dataframe_compare_lege: dataFrameLe/dataFrameGe - series_floordiv_standalone: seriesFloorDiv/seriesMod/seriesPow - drop_duplicates_fn: dropDuplicatesSeries/dropDuplicatesDataFrame - nsmallest_series_fn: nsmallestSeries - duplicated_fn: duplicatedSeries/duplicatedDataFrame Run: https://github.com/githubnext/tsessebe/actions/runs/24558253472 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added benchmarks: - df_any_all_axis1: anyDataFrame/allDataFrame row-wise (axis=1) - df_nunique_axis1: nuniqueDataFrame row-wise (axis=1) - cat_codes_accessor: CategoricalAccessor.codes/nCategories/ordered properties - ewm_adjust: EWM with adjust=false (IIR) vs adjust=true - interpolate_bfill_limit: interpolateSeries bfill method with limit option Run: https://github.com/githubnext/tsessebe/actions/runs/24562479978 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Warning The 🤖 Iteration 152 — ✅ Accepted
|
Added 5 new benchmark pairs: - datetime_index_ops: DatetimeIndex sort/unique/toStrings/slice/contains/concat - datetime_index_snap: DatetimeIndex.snap(freq) to month-start and week boundaries - period_index_query: PeriodIndex.getLoc/contains querying operations - series_groupby_agg_all: SeriesGroupBy all aggregations (sum/mean/std/min/max/count/first/last) - dataframe_rolling_median: DataFrameRolling.median and DataFrameExpanding.median Run: https://github.com/githubnext/tsessebe/actions/runs/24564770860 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 153 — ✅ Accepted (Run)
|
Added 5 new pairs: - datetime_index_normalize_filter_shift (DatetimeIndex.normalize/filter/shift) - index_map (Index.map transform function) - multi_index_fromtuples (MultiIndex.fromTuples construction) - timedelta_advanced_ops (Timedelta.parse/toISOString/divBy/negate/mul/compareTo) - dataframe_rolling_var_std_sum_count (DataFrameRolling.var/std/sum/count) Run: https://github.com/githubnext/tsessebe/actions/runs/24565880287 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 154 — ✅ Accepted
|
|
🤖 Iteration 154 — ✅ Accepted
|
…DatetimeIndex extra + TimedeltaIndex toStrings Run: https://github.com/githubnext/tsessebe/actions/runs/24567781388 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Iteration 155 — ✅ Accepted
|
Added benchmarks for DateOffset rollforward/rollback/onOffset, more DateOffset types (MonthBegin/YearEnd/Week/Minute/Milli), date_range with various frequency options, combineFirstDataFrame standalone function, and SeriesGroupBy.agg with custom aggregate functions. Run: https://github.com/githubnext/tsessebe/actions/runs/24570329650 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Iteration 156 — AcceptedMetric: 493 → 498 (+5) ChangesAdded 5 new benchmark pairs:
Commit: 352e1b7 |
Added benchmarks for: - nan_agg_extended: nancount/nanprod/nanmedian (extended nan aggregates) - rank_methods: rankSeries with min/max/first/dense tie-breaking methods - dropna_advanced: dropnaDataFrame with thresh/subset/axis=1 options - get_dummies_opts: getDummies/dataFrameGetDummies with prefix/dropFirst/dummyNa - factorize_sort: factorize/seriesFactorize with sort=true/useNaSentinel options Run: https://github.com/githubnext/tsessebe/actions/runs/24572885192 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Iteration 158 Accepted — Metric: 508 (+5 vs 503)New benchmark pairs added:
Metric: 503 → 508 (+5) | Run: §24573945763 |
|
Warning The ✅ Iteration 159 — 5 new benchmark pairs added (513 total)Metric: 508 → 513 (+5) New benchmark pairs
|
🤖 This PR is maintained by Autoloop. Each accepted iteration adds a commit to this branch.
Summary
Goal: Systematically benchmark every tsb function against its pandas equivalent, one function per iteration.
Metric:
benchmarked_functions(higher is better)Current best: 388 benchmark pairs
Links
perf-comparison.mdLatest Iteration (135)
Added 8 benchmark pairs:
dataframe_shift_diff,dataframe_pow_mod,clip_series_bounds,reindex,dataframe_compare,series_add_sub_mul_div,numeric_ops_math,dataframe_add_sub_mul_div.Total benchmark pairs: 388 (+8 from prev best 380)