This repository now has a starter layout for training a PvP bot AI model.
src/bot_training/- reusable Python package codedata/- data loading and preprocessing helpersfeatures/- feature engineering helperstraining/- training entry points and logicevaluation/- validation and metrics codeinference/- prediction helpers
data/raw/- original match datasetsdata/interim/- intermediate files during cleaningdata/processed/- cleaned and transformed training datadata/splits/- train/validation/test splitsmodels/checkpoints/- saved checkpointsmodels/exports/- exported model artifactsreports/metrics/- evaluation outputs and summariesreports/figures/- charts and plotsscripts/- one-off runnable utilitiestests/- smoke tests for the scaffold
Run the smoke test:
python -m unittestRun pytest suites (including feature engineering tests):
python -m pytestPrepare raw data inventory:
python scripts/prepare_data.pyThis project now includes a chunked pandas pipeline that groups rows into matches, filters them by quality, and writes a clean sequential CSV.
Note: many PvP logs alternate playerName every tick. Because of that, player-change splitting is off by default so rows are not split into one-frame matches. You can enable strict player boundary splitting with --split-on-player-change.
Per-file output example (one clean CSV per input file):
python3 scripts/prepare_data.py \
--input-dir data/raw \
--output-mode per-file \
--output-dir data/processed/phase1_clean_matches_per_file \
--progress \
--min-frames 400 \
--max-damage-taken 60 \
--min-attack-accuracy 0.20 \
--min-sprint-uptime 0.15Use the sweep utility to evaluate a grid of threshold combinations and rank them by keep-rate/quality tradeoff.
python3 scripts/sweep_thresholds.py \
--input-dir data/raw \
--sample-fraction 0.1 \
--sample-seed 42 \
--min-frames-grid 400,700,1000 \
--max-damage-grid 40,50,60 \
--min-attack-accuracy-grid 0.20,0.30,0.40 \
--min-sprint-uptime-grid 0.15,0.30,0.60 \
--score-weights 0.25,0.25,0.25,0.25 \
--top-k 10The command writes a ranked report to reports/metrics/phase1_threshold_sweep.csv.
Use --sample-fraction 0.1 to run on roughly 1/10 of files for faster iteration.
Convert cleaned phase 1 rows into normalized frame tensors and sequence windows. Phase 2 now writes one NPZ per cleaned match CSV under data/processed/phase2_feature_tensors_per_file/:
Continuous features use fixed Minecraft-aware scaling (no fitted scaler artifact):
health,targetHealth/ 20yaw,targetYaw/ 180pitch,targetPitch/ 90- spatial terms / 50 (upper-clipped to
1.0) - velocity terms / 4 (upper-clipped to
1.0)
python scripts/build_features.py \
--input-file data/processed/phase1_clean_matches_per_file/example_ai_clean.csv \
--output-file data/processed/phase2_feature_tensors.npz \
--vocabulary-file models/exports/phase2_item_vocabulary.jsonRun the full batch pipeline over every per-file clean CSV:
python3 scripts/build_features.py \
--input-dir data/processed/phase1_clean_matches_per_file \
--input-pattern "*_clean.csv" \
--output-dir data/processed/phase2_feature_tensors_per_file \
--manifest-file data/processed/phase2_feature_manifest.json \
--vocabulary-file models/exports/phase2_item_vocabulary.json \
--window-size 20Optional: add --max-files 100 for a quick subset dry run.
Saved NPZ fields:
inputs: normalized frame-level input matrixtargets: frame-level action + slot + deltaYaw/deltaPitch targetsinput_windows: overlapping windows with shape[num_windows, 20, feature_count]sequence_targets: target rows aligned to the end of each input windowwindow_match_ids: match IDs aligned to each input window
Train on a real Phase 2 artifact with the MLX sequence model:
python3 scripts/train_model.py \
--dataset data/processed/phase2_feature_tensors_per_file \
--checkpoint models/checkpoints/phase4_best_weights.npz \
--epochs 50 \
--batch-size 256 \
--learning-rate 0.0001The trainer splits windows by match_id, uses an 80/20 train/validation split, and saves the best checkpoint only when validation loss improves.
To continue training from an existing checkpoint, pass --resume. If the checkpoint file exists, the model loads those weights before running new epochs.
python3 scripts/train_model.py \
--dataset data/processed/phase2_feature_tensors_per_file \
--checkpoint models/checkpoints/phase4_best_weights.npz \
--resume \
--epochs 20 \
--batch-size 256 \
--learning-rate 0.0001Use this helper script when you want to erase generated Phase 2 and Phase 4 artifacts, rebuild Phase 2 tensors, and retrain the model end-to-end.
What it deletes before rebuilding:
data/processed/phase2_feature_tensors_per_file/data/processed/phase2_feature_manifest.jsondata/processed/phase2_feature_tensors.npz(single-file artifact if present)models/exports/phase2_item_vocabulary.jsonmodels/checkpoints/phase4_best_weights.npz
Run a safe preview first:
python3 scripts/rebuild_phase2_and_train_phase4.py --dry-runRun the full rebuild + retrain:
python3 scripts/rebuild_phase2_and_train_phase4.pyCommon overrides:
python3 scripts/rebuild_phase2_and_train_phase4.py \
--input-dir data/processed/phase1_clean_matches_per_file \
--input-pattern "*_clean.csv" \
--epochs 50 \
--batch-size 256 \
--learning-rate 0.0001Optional quick subset run while debugging:
python3 scripts/rebuild_phase2_and_train_phase4.py --max-files 100 --epochs 5Run scenario-based checks against a trained checkpoint (dual input: continuous windows + mock inventory windows):
python3 scripts/assert_phase4_scenarios.py \
--checkpoint models/checkpoints/phase4_best_weights.npz \
--item-vocab models/exports/phase2_item_vocabulary.json \
--allow-failuresUseful options:
--allow-failures: print all scenario results and exit with code0even if checks fail.--high-prob,--drop-prob,--rise-prob,--very-large-positive-pitch: tune assertion thresholds.--drink-slot,--splash-slot,--food-slot,--golden-apple-slot: override expected hotbar slot indices.
Quick run that never fails CI locally:
python3 scripts/assert_phase4_scenarios.py --allow-failuresOutput format:
- Each scenario prints one line like
[PASS] Step X - ...or[FAIL] Step X - .... - The script ends with
Completed <N> scenario checks with <M> failure(s).. - Without
--allow-failures, any failed scenario raises an assertion and returns a non-zero exit code.
Run the FastAPI server that serves Phase 4 model predictions:
uv run python3 scripts/run_inference_api.pyDefault server URL:
http://127.0.0.1:8000- Prediction endpoint:
POST http://127.0.0.1:8000/predict
Before starting the API, make sure these files exist:
models/checkpoints/phase4_best_weights.npzmodels/exports/phase2_item_vocabulary.json
Quick local health check:
curl http://127.0.0.1:8000/docsQuick prediction request example:
curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"bot_id": "bot-1",
"bot": {
"x": 0.0,
"y": 64.0,
"z": 0.0,
"yaw": 0.0,
"pitch": 0.0,
"vel_x": 0.0,
"vel_y": 0.0,
"vel_z": 0.0,
"health": 20.0,
"food": 20.0,
"is_on_ground": true
},
"target": {
"x": 2.0,
"y": 64.0,
"z": 2.0,
"yaw": 180.0,
"pitch": 0.0,
"vel_x": 0.0,
"vel_y": 0.0,
"vel_z": 0.0,
"health": 20.0,
"food": 20.0,
"is_on_ground": true
},
"inventory": {
"main_hand": "DIAMOND_SWORD",
"off_hand": "AIR",
"hotbar": [
"DIAMOND_SWORD",
"SPLASH_POTION",
"POTION",
"COOKED_BEEF",
"GOLDEN_APPLE",
"AIR",
"AIR",
"AIR",
"AIR"
]
}
}'The API keeps a rolling 20-frame buffer per bot_id and returns one action prediction payload per request.