Skip to content

optimize getCommonChunk: ~8x speedup #309

Open
prateek-kalwar-95 wants to merge 2 commits intoanimint:masterfrom
prateek-kalwar-95:optimize-getCommonChunk
Open

optimize getCommonChunk: ~8x speedup #309
prateek-kalwar-95 wants to merge 2 commits intoanimint:masterfrom
prateek-kalwar-95:optimize-getCommonChunk

Conversation

@prateek-kalwar-95
Copy link
Copy Markdown

fix for #258.

This PR optimizes getCommonChunk() in pure R by restructuring the inner loop logic. A separate C++ migration is also in progress.

Changes in R/z_animintHelpers.R

  • Replace nested data.table(col.name=...)[, { built[, { get(col.name)... }, by=group] }] with vapply() + .SDcols
  • Split into two phases: fast commonality check (all columns, boolean only) then value extraction (only the ~2-5 common columns)
  • Pre-join group size info via update-join to avoid per-group lookups
  • Filter to only multi-chunk groups in check phase (single-chunk groups are trivially common)

Benchmark Results

Before (master)

--- Polygon-like data (varying number of groups) ---
  groups=   50  rows=   1000  elapsed=0.140s
  groups=  100  rows=   2000  elapsed=0.230s
  groups=  500  rows=  10000  elapsed=1.090s
  groups= 1000  rows=  20000  elapsed=2.100s
  groups= 2000  rows=  40000  elapsed=4.350s

After (this PR)

--- Polygon-like data (varying number of groups) ---
  groups=   50  rows=   1000  elapsed=0.070s
  groups=  100  rows=   2000  elapsed=0.060s
  groups=  500  rows=  10000  elapsed=0.160s
  groups= 1000  rows=  20000  elapsed=0.260s
  groups= 2000  rows=  40000  elapsed=0.530s

Speedup Summary

Groups Before After Speedup
50 0.140s 0.070s 2.0x
100 0.230s 0.060s 3.8x
500 1.090s 0.160s 6.8x
1000 2.100s 0.260s 8.1x
2000 4.350s 0.530s 8.2x

Benchmark Code

library(animint2)
library(data.table)

make_polygon_data <- function(n_groups, n_chunks = 5) {
  data.table(
    x = rep(rep(1:4, each = 1), n_groups * n_chunks),
    y = rep(rep(c(0, 1, 1, 0), each = 1), n_groups * n_chunks),
    group = rep(rep(seq_len(n_groups), each = 4), n_chunks),
    fill = rep(paste0("color", seq_len(n_chunks)),
               each = n_groups * 4),
    WEEKEND = rep(paste0("chunk", seq_len(n_chunks)),
                  each = n_groups * 4)
  )
}

for (ng in c(50, 100, 500, 1000, 2000)) {
  dt <- make_polygon_data(ng, n_chunks = 5)
  t <- system.time({
    result <- animint2:::getCommonChunk(
      dt, chunk.vars = "WEEKEND",
      aes.list = list(group = "group"))
  })
  cat(sprintf("  groups=%5d  rows=%7d  elapsed=%.3fs\n",
              ng, nrow(dt), t[["elapsed"]]))
}

Tests

All 27 existing test-compiler-save-separate-chunks.R tests pass.

@prateek-kalwar-95
Copy link
Copy Markdown
Author

Hi @tdhock @suhaani-agarwal , could you please review this PR when you have time? All 27 existing tests are passing. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant