Skip to content

Bridge+Memory leak fix+More functions#19

Merged
maderix merged 1 commit intomaderix:mainfrom
vipuldivyanshu92:features/brigde+memoryleakfix+newfuns
Mar 3, 2026
Merged

Bridge+Memory leak fix+More functions#19
maderix merged 1 commit intomaderix:mainfrom
vipuldivyanshu92:features/brigde+memoryleakfix+newfuns

Conversation

@vipuldivyanshu92
Copy link
Copy Markdown
Contributor

Added Bridge APIs
Added temp workaround for 119 ANE compile limit
Offloaded the Classifier & Softmax to the ANE

Copy link
Copy Markdown
Owner

@maderix maderix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review & Benchmark Report

Thoroughly reviewed and benchmarked this PR. Great contribution — thanks for putting this together!

Benchmark Results (Stories110M, 30 steps)

Metric Baseline (train_large) This PR (train_large_ane) Improvement
ms/step (train) 108.6 91.1 -16% (17.5ms saved)
Compiles per restart 72 99 +27 more kernels
Compile time 3.7s 4.1s +400ms
Loss trajectory 10.39→9.78 10.39→9.78 Identical ✅

Where the gains come from

  • Classifier forward: 1.3ms ANE vs ~9.3ms CPU cblas — ~8ms saved
  • Softmax: 2.7ms ANE vs ~75ms CPU — ~7ms saved
  • RMSNorm backward: 0.12ms/eval on ANE — marginal gain but good to have

Tests

  • test_rmsnorm_bwd: PASS ✅ (max error 0.009, well within 0.05 threshold)
  • test_classifier: All 4 subtests PASS ✅ (rmsnorm, classifier fwd, softmax, classifier bwd)

Correctness

Loss trajectories match baseline exactly — the ANE-offloaded ops produce numerically equivalent results.

Notes for follow-up

  • Compile budget is tight (99/~100 limit) — may want to optimize
  • gen_classifier_bwd() is unused (CPU cblas is faster for that shape) — could remove or document as experimental
  • The bridge library is a nice abstraction for Python/ctypes consumers

Thanks for the contribution! 🎉

@maderix maderix merged commit c330774 into maderix:main Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants