PRC Data Challenge

Contribution of Malte Cordts, Sabrina Kerz, and Dennis Schorn to the PRC Data Challenge 2024 as team_organized_volcano. This code falls under GNU GPLv3, see the license tab for the full license.

Current rankings

Available here

Rank	Team Name	RMSE	File Version
1	team_likable_jelly	1561.63	v25
2	team_tiny_rainbow	2217.75	v1
3	team_gentle_elephant	2252.89	v11
4	team_brave_pillow	2270.06	v101
5	team_delightful_avocado	2355.61	v16
6	team_youthful_xerox	2386.53	v8
7	team_affectionate_bridge	2456.7	v6
8	team_honest_turtle	2479.56	v46
9	team_exuberant_scooter	2587.5	v7
10	team_mindful_donkey	2683.05	v9
11	team_diligent_igloo	2692.37	v16
12	team_amazing_forest	2695.6	v18
13	team_modest_scooter	2696.67	v6
14	team_gentle_wreath	2702.16	v20
15	team_faithful_engine	2705.24	v6
16	team_elegant_lemon	2746.3	v26
17	team_patient_net	2752.72	v10
18	team_mellow_barn	2859.32	v14
19	team_exuberant_hippo	2932.55	v8
20	team_zealous_watermelon	3024.17	v1
21	team_loyal_hippo	3029.57	v10
22	team_zesty_ostrich	3092.39	v11
23	team_mindful_puzzle	3250.64	v3
24	team_jolly_koala	3254.78	v48
25	team_amiable_garden	3264.46	v21
26	team_bold_emu	3286.56	v3
27	team_motivated_baker	3409.82	v1
28	team_nice_wolf	3595.25	v3
29	team_energetic_quiver	3683.31	v15
30	team_organized_volcano	3755.47	v9
31	team_faithful_napkin	3810.9	v0
32	team_outspoken_engine	3960.86	v10
33	team_respectful_kangaroo	4052.19	v5
34	team_nice_hippo	4263.82	v7
35	team_joyful_zeppelin	4837.51	v4
36	team_refreshing_unicorn	6533.31	v1
37	team_versatile_yacht	6713.85	v18
38	team_zippy_river	6839.35	v11
39	team_nice_jacket	9018.77	v6
40	team_zippy_horse	11377.51	v3
41	team_knowledgeable_jungle	20100.99	v18
42	team_funny_yogurt	51510.18	v2
43	team_dependable_gorilla	89851.11	v4

Our models

base_model

The main model we used for most of the project, HGBR. This was initially trained on features from the flight list. For this we also engineered the features that covered the timing of the flight ([Engineered Features](#Engineered Features)) After iteratively determining the most impactful features for the model (via sklearn's permutation_importance), we used it for all our submissions.

traj_model

The secondary model, another HGBR but with additional features that contained information from the trajectories. For this we first determined how many datapoints were missing per flight. This KPI is the percentage of available data, and we used it to distinguish high quality data ( KPI > 0.8) from low quality data ( KPI <= 0.8). Then we removed all repeating constant values at the start and end of each trajectory, since these appear to be artifacts in the data without meaningful information. Next we split each trajectory into 3 phases, the ascending phase during which the climb rate is positive, the cruising phase during which the climb rate is more or less constant, and the descending phase during which the aircraft has a negative climb rate. We then calculated additional values to feed into the model as more features:

the sum of the vertical rate changes during ascending and descending phase each
average altitude of the cruising phase
duration of the cruising phase
average groundspeed during the cruising phase
the kpi

Notebooks

Initial Data Review

Flight List Based Model

Script for training and predicting the model

Documentation

Flight List Based Model

Information

Data Explanation

Introduction Slides

Goal

We aim to hand in a solution before the final deadline!

Decisions & Plans

We want to start with a simple model, using only the flight list
Then iterate & improve it by adding handcrafted features
Next include data from the actual trajectories, without temporal features
Then move to more complex models if necessary, eventually ending up with a transformer
Optimise for RMSE, since this is used in the final scoring of our submission

Model Features Overview

FightList

This table lists all the features in the flightlist and indicates whether each feature is used in the models.

Raw Features

Feature	1. HGBR Model
flight_id (unique ID)	❌
callsign (obfuscated callsign)	❌
adep (Aerodrome of DEParture)	❌
ades (Aerodrome of DEStination)	❌
name_adep (ADEP airport name)	❌
name_ades (ADES airport name)	❌
country_code_adep (ADEP country code)	✅
country_code_ades (ADES country code)	✅
date (date of flight)	❌
actual_offblock_time (AOBT)	❌
arrival_time (ARVT)	❌
aircraft_type (aircraft type code)	✅
wtc (Wake Turbulence Category)	✅
airline (Aircraft Operator code)	✅
flight_duration (flight duration in mins)	✅
taxiout_time (taxi-out time in mins)	✅
flown_distance (route length in nmi)	✅

Engineered Features

Feature	2. HGBR Model
weekday	✅
year sin	✅
arrival day sin	✅
start_hour	✅

Trajectories

Engineered Features

Feature	3. HGBR Model
Average climb rate, 1st flight phase	✅
Average climb rate, 3rd flight phase	✅
Average altitude, 2nd flight phase	✅

Versions

Since the trajectory data was updated during the project phase, we downloaded and processed the data multiple times. The first few versions (0-6) were testing different aspects of the model on the early data. From 7 onwards we worked with the final data (submission_set + final_submission_set). All trajectories were re-downloaded and processed after version 8. New features from trajectories were extracted after version 12. The specific versions were

kpi > 0.8 traj_model, rest base_model on rest of data
kpi > 0.8 traj model, rest base_model on all data
all base_model
all base_model, sorted by index (to test if the order of the data matters in the submission)
kpi > 0.8 traj model, rest base_model on all data, sorted by index
traj model only on tow > 250t and kpi > 0.8, rest base
traj model with new trajectories traj model only on tow > 250t and kpi > 0.8, rest base
traj model with custom weights that equal the kpi of each flight
traj model with new trajectories traj model only on tow > 250t and kpi > 0.0, rest base

Our submissions

File Version	RMSE
v6	9959.47
v7	9950.77
v8	10106.12
v9	3755.47
v10	3755.47
v11	10106.12
v12	4341.19
v13	4023.11
v14	4525.74
v15	4446.98

Our models continued to display better RMSE for our train and test data, but the performance did not improve as expected on the actual submission set.

We suspect part of the problem to be the actual distribution of the data. We found no flights over 250t in the data that we initially used for training our traj_model, while flights with over 250t appeared in the final_submission_set. Using the traj_model only on flights below a base_model prediction of 250t was tested (v12 and later) but the improvement form this was still smaller than just using the base_model.

For kpi=0 it looks like the traj_model does not overfit on our data:

We were unable to fully determine why our models performed worse than expected on the final data set.

Getting the code to run

Make sure mc client is set up as described on data challenge page, then run train_and_submit.py

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyse_trajectories.py		analyse_trajectories.py
download_trajectories.py		download_trajectories.py
first_hgbr_model.py		first_hgbr_model.py
icao_code-mtow.csv		icao_code-mtow.csv
inspector.py		inspector.py
map_trajectories.py		map_trajectories.py
overfit_test.png		overfit_test.png
response_1723016820162.json		response_1723016820162.json
rnn_model.py		rnn_model.py
submit_solution.py		submit_solution.py
test_submission.py		test_submission.py
train_and_submit.py		train_and_submit.py
trajectory_features.rar		trajectory_features.rar
trajectory_kpi_old.png		trajectory_kpi_old.png
trajectory_kpi_oldest.png		trajectory_kpi_oldest.png
truncate_trajectories.py		truncate_trajectories.py
truncated_trajectories.pkl		truncated_trajectories.pkl
type_mtow_mlw.csv		type_mtow_mlw.csv
types_weights.ipynb		types_weights.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRC Data Challenge

Current rankings

Our models

base_model

traj_model

Notebooks

Documentation

Information

Goal

Decisions & Plans

Model Features Overview

FightList

Raw Features

Engineered Features

Trajectories

Engineered Features

Versions

Our submissions

We were unable to fully determine why our models performed worse than expected on the final data set.

Getting the code to run

License

About

Uh oh!

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

TheRealAkumetsu/prc_data_challenge

Folders and files

Latest commit

History

Repository files navigation

PRC Data Challenge

Current rankings

Our models

base_model

traj_model

Notebooks

Documentation

Information

Goal

Decisions & Plans

Model Features Overview

FightList

Raw Features

Engineered Features

Trajectories

Engineered Features

Versions

Our submissions

We were unable to fully determine why our models performed worse than expected on the final data set.

Getting the code to run

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages