Skip to content

TLucking/RL_Block-Assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to setup the environment

# create the Conda environment 
$ conda env create -f block_rl_env.yml
$ conda activate block_rl

# Stable‑Baselines3 extras: usefull to have the training progress bar.
$ pip install stable-baselines3\[extra\]

How does the env work?

assembly_env.py

Low‑level geometry & physics backend. Responsible for:

  • keeping the list of blocks (self.block_list)
  • collision checks & static stability (is_stable_rbe)
  • dense reward heat‑map generation

assembly_gym.py

Gymnasium wrapper that the RL agent actually interacts with. It:

  • exposes a Discrete(300) action space with automatic action‑masking
  • concatenates state image + reward image into a single flat observation
  • offers live Matplotlib rendering (--render)

train.py

Train an agent with Stable‑Baselines3. Key CLI flags (run -h for all):

Flag Default Description
--task bridge Task to learn: bridge, tower, double_bridge
--algo maskppo RL algorithm: maskppo (masked PPO) or plain ppo
--timesteps 200_000 Total training steps (across all envs)
--save-freq 10_000 Checkpoint frequency (steps) for saving models & eval
--logdir runs Output directory for checkpoints and TensorBoard logs
--device cpu Compute device: cpu, cuda, or auto
--render False Render the environment (only works when --n-envs 1)
--debug False Enable DEBUG‑level logging
--progress-bar False Show SB3 progress bar during training
--config None Path to YAML with extra hyper‑parameters (overrides CLI)
-m, --resume-model None Path to a .zip model to continue training from
-n, --n-envs 1 Number of parallel environments (≥2 uses SubprocVecEnv)

Train from scratch

python train.py --task bridge --algo maskppo --timesteps 100000 --progress-bar --config configs/maskppo.yaml  

Resume training

python train.py --task bridge --algo maskppo --timesteps 100000 --progress-bar --config configs/maskppo.yaml  -m runs/bridge_maskppo_0506204539/best_model/best_model.zip

Monitor training To monitor the training you can run the following command in the main directory.

tensorboard --logdir runs

run_policy.py

Roll out a trained policy for qualitative inspection.

python run_policy.py --model runs/bridge_maskppo_0506212053/best_model/best_model.zip --task bridge --algo maskppo --render --debug

Here is an example of a rollout


RL choices

Maskable PPO (maskppo)

  • Observation   8192‑D vector (2 × 64 × 64 images flattened).
  • Action space  300 discrete indices;
  • Reward        sum of overlaps between the newly placed block and Gaussian blobs centred on targets.
  • Masking       sb3_contrib.ActionMasker removes illegal moves before softmax → faster learning & fewer crashes.

Other SB3 algorithms (SAC, A2C…) will work, but the policy network must be adapted to flat image inputs.

About

Block stacking strategy using Reinfocement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages