Replace neural network encoder layer with fused LinearMax cuda kernel #112

daphne-cornelisse · 2025-11-03T00:13:45Z

The neural network in pufferdrive is currently one of the main speed bottlenecks.

The linear + max operations are particularly expensive. This PR speeds up training by replacing the torch linear + max operations for the road and partner encoder with a cuda kernel that fuses these operations.

The result is a speed up of ~ 3.5 X in SPS. On an RTX-4080: 200K (main) -> 700K (new)

While the new network needs more steps, it is still an improvement over the net in main because identical performance is reached in less wall clock time.

I also switched the number of road points from 200 -> 128 and verified empirically that that is enough to get an off-road rate of near zero.

… obscfg

…to obscfg

… obscfg

…to obscfg

… obscfg

Joseph Suarez and others added 15 commits November 1, 2025 23:20

start on obs space cfg

7f7c0d6

rmv magic numbers

968be11

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

9417dc8

… obscfg

Pass ini -> drive env.

371264f

Working on fuse linear

65f49b0

Integrate 700k SPS kernel.

883d8db

Dirty

881134d

Profile fix

8da6602

conflict

e04d77f

linear max kernel

960ee09

Add linearmax kernels

676439b

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

5f6e78d

… obscfg

Merge branch 'obscfg' of https://github.com/Emerge-Lab/PufferDrive in…

549be6c

…to obscfg

LinearMax with layernorm: 600/700K SPS, score of 0.85 in < 5 min.

694f71d

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

35c1cb1

… obscfg

daphne-cornelisse changed the title ~~Obscfg~~ Faster training Nov 8, 2025

daphne-cornelisse added 3 commits November 8, 2025 12:41

Use 1000 maps

fe7a82d

Best run gets score of .95 in 25 minutes.

96fec96

Advantage weighting exps

32c1e70

This comment was marked as outdated.

Sign in to view

daphne-cornelisse added 10 commits November 11, 2025 10:30

Sweep setup.

25cd151

Increase steps.

f100a2e

Remove exploration logging.

0b05836

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

a5283ef

… obscfg

Delete Triton kernel.

23460bc

Best score is 0.98 with 600K FPS. 800 K FPS with 128 road points.

16e752b

Better hparams

30277f3

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

b026db8

… obscfg

Replace default learning rate and ent_coef.

88431f7

Minor

23ce0bf

daphne-cornelisse added 8 commits November 16, 2025 17:38

Sweep setup.

e65f868

Increase steps.

a1e8a60

Remove exploration logging.

38587d3

Delete Triton kernel.

bdfdfd9

Best score is 0.98 with 600K FPS. 800 K FPS with 128 road points.

3e5f6cc

Better hparams

709c517

Merge branch 'obscfg' of https://github.com/Emerge-Lab/PufferDrive in…

f25ffab

…to obscfg

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

e678532

… obscfg

daphne-cornelisse changed the title ~~Faster training~~ Replace neural network encoder with fused LinearMax cuda kernel Nov 17, 2025

Emerge-Lab deleted a comment from greptile-apps bot Nov 17, 2025

daphne-cornelisse marked this pull request as ready for review November 17, 2025 14:52

This comment was marked as off-topic.

Sign in to view

daphne-cornelisse added 2 commits November 17, 2025 10:12

Clean up.

9589493

Minor

b4fc041

daphne-cornelisse changed the title ~~Replace neural network encoder with fused LinearMax cuda kernel~~ Replace neural network encoder layer with fused LinearMax cuda kernel Nov 17, 2025

daphne-cornelisse requested a review from eugenevinitsky November 17, 2025 17:16

daphne-cornelisse added 10 commits November 17, 2025 14:04

Update profiling script.

28e83c0

Adapt drive c network.

91d058a

Temp settings

ebe8805

minor

d2b7a08

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

96c4b5a

… obscfg

Set max agents to 32

1c4fc69

Add linear -> relu -> max kernel.

e25a465

Score is .98 after 70 minutes.

1e343f6

Minor

1495e61

Merge branch 'main' of https://github.com/Emerge-Lab/PufferDrive into…

cd6a84a

… obscfg

daphne-cornelisse changed the base branch from main to gsp_dev November 25, 2025 14:26

daphne-cornelisse added the research label Nov 25, 2025

Merge branch 'gsp_dev' into obscfg

a157b11

daphne-cornelisse merged commit f1bf6aa into gsp_dev Nov 25, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace neural network encoder layer with fused LinearMax cuda kernel #112

Replace neural network encoder layer with fused LinearMax cuda kernel #112

Uh oh!

daphne-cornelisse commented Nov 3, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Replace neural network encoder layer with fused LinearMax cuda kernel #112

Replace neural network encoder layer with fused LinearMax cuda kernel #112

Uh oh!

Conversation

daphne-cornelisse commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daphne-cornelisse commented Nov 3, 2025 •

edited

Loading