GitHub - TLucking/RL_Block-Assembly: Block stacking strategy using Reinfocement Learning

How to setup the environment

# create the Conda environment 
$ conda env create -f block_rl_env.yml
$ conda activate block_rl

# Stable‑Baselines3 extras: usefull to have the training progress bar.
$ pip install stable-baselines3\[extra\]

How does the env work?

`assembly_env.py`

Low‑level geometry & physics backend. Responsible for:

keeping the list of blocks (self.block_list)
collision checks & static stability (is_stable_rbe)
dense reward heat‑map generation

`assembly_gym.py`

Gymnasium wrapper that the RL agent actually interacts with. It:

exposes a Discrete(300) action space with automatic action‑masking
concatenates state image + reward image into a single flat observation
offers live Matplotlib rendering (--render)

`train.py`

Train an agent with Stable‑Baselines3. Key CLI flags (run -h for all):

Flag	Default	Description
`--task`	`bridge`	Task to learn: `bridge`, `tower`, `double_bridge`
`--algo`	`maskppo`	RL algorithm: `maskppo` (masked PPO) or plain `ppo`
`--timesteps`	`200_000`	Total training steps (across all envs)
`--save-freq`	`10_000`	Checkpoint frequency (steps) for saving models & eval
`--logdir`	`runs`	Output directory for checkpoints and TensorBoard logs
`--device`	`cpu`	Compute device: `cpu`, `cuda`, or `auto`
`--render`	`False`	Render the environment (only works when `--n-envs 1`)
`--debug`	`False`	Enable DEBUG‑level logging
`--progress-bar`	`False`	Show SB3 progress bar during training
`--config`	None	Path to YAML with extra hyper‑parameters (overrides CLI)
`-m`, `--resume-model`	None	Path to a `.zip` model to continue training from
`-n`, `--n-envs`	`1`	Number of parallel environments (≥2 uses `SubprocVecEnv`)

Train from scratch

python train.py --task bridge --algo maskppo --timesteps 100000 --progress-bar --config configs/maskppo.yaml

Resume training

python train.py --task bridge --algo maskppo --timesteps 100000 --progress-bar --config configs/maskppo.yaml  -m runs/bridge_maskppo_0506204539/best_model/best_model.zip

Monitor training To monitor the training you can run the following command in the main directory.

tensorboard --logdir runs

`run_policy.py`

Roll out a trained policy for qualitative inspection.

python run_policy.py --model runs/bridge_maskppo_0506212053/best_model/best_model.zip --task bridge --algo maskppo --render --debug

Here is an example of a rollout

RL choices

Maskable PPO (`maskppo`)

Observation 8192‑D vector (2 × 64 × 64 images flattened).
Action space 300 discrete indices;
Reward sum of overlaps between the newly placed block and Gaussian blobs centred on targets.
Masking sb3_contrib.ActionMasker removes illegal moves before softmax → faster learning & fewer crashes.

Other SB3 algorithms (SAC, A2C…) will work, but the policy network must be adapted to flat image inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Images		Images
configs		configs
utils		utils
.gitignore		.gitignore
Feature_extractor.py		Feature_extractor.py
README.md		README.md
assembly_env.py		assembly_env.py
assembly_gym.py		assembly_gym.py
block_rl_env.yml		block_rl_env.yml
blocks.py		blocks.py
cra_helper.py		cra_helper.py
environment.yml		environment.yml
example.py		example.py
example_2.py		example_2.py
example_3.py		example_3.py
geometry.py		geometry.py
issues.md		issues.md
notes.md		notes.md
old_README		old_README
profile.stats		profile.stats
pyomo_helper.py		pyomo_helper.py
rbe_pyomo.py		rbe_pyomo.py
rendering.py		rendering.py
run_policy.py		run_policy.py
stability.py		stability.py
tasks.py		tasks.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to setup the environment

How does the env work?

`assembly_env.py`

`assembly_gym.py`

`train.py`

`run_policy.py`

RL choices

Maskable PPO (`maskppo`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How to setup the environment

How does the env work?

assembly_env.py

assembly_gym.py

train.py

run_policy.py

RL choices

Maskable PPO (maskppo)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`assembly_env.py`

`assembly_gym.py`

`train.py`

`run_policy.py`

Maskable PPO (`maskppo`)

Packages