Skip to content

Reproduce experiments#235

Merged
motiwari merged 12 commits intodataset_experimentsfrom
reproduce_experiments
Sep 18, 2022
Merged

Reproduce experiments#235
motiwari merged 12 commits intodataset_experimentsfrom
reproduce_experiments

Conversation

@vxbrandon
Copy link
Copy Markdown
Contributor

@vxbrandon vxbrandon commented Sep 18, 2022

*Reproduce experiment results with ec2 instance (t2.2xlarge)
Can reproduce by running bash repro_script.sh

==============================
Table 1 (Classification): MNIST
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      2149.006 ± 13.179                       1.44E+08 ± 4.85E+05                     0.777 ± 0.005
1st row                                 RF + MABSplit                           60.034 ± 0.413                          3.37E+06 ± 1.62E+04                     0.763 ± 0.008
2nd row                                 ExtraTrees                              2476.905 ± 4.571                        1.68E+08 ± 0.00E+00                     0.762 ± 0.003
2nd row                                 ExtraTrees + MABSplit                   74.171 ± 0.244                          4.32E+06 ± 7.69E+03                     0.755 ± 0.002
3rd row                                 RP                                      1980.633 ± 9.167                        1.32E+08 ± 6.95E+05                     0.771 ± 0.003
3rd row                                 RP + MABSplit                           56.512 ± 0.358                          3.17E+06 ± 1.40E+04                     0.768 ± 0.003
==============================
Table 1 (Classification): APS
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      29.559 ± 0.117                          3.77E+06 ± 9.66E+03                     0.985 ± 0.0
1st row                                 RF + MABSplit                           0.791 ± 0.005                           6.94E+04 ± 2.19E+02                     0.985 ± 0.0
2nd row                                 ExtraTrees                              27.013 ± 0.079                          3.78E+06 ± 0.00E+00                     0.985 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   0.695 ± 0.003                           7.00E+04 ± 0.00E+00                     0.985 ± 0.0
3rd row                                 RP                                      25.574 ± 0.116                          3.22E+06 ± 1.18E+04                     0.985 ± 0.0
3rd row                                 RP + MABSplit                           0.702 ± 0.006                           5.96E+04 ± 2.19E+02                     0.985 ± 0.0
==============================
Table 1 (Classification): FLIGHT
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      87.382 ± 0.448                          1.16E+07 ± 3.01E+04                     0.815 ± 0.0
1st row                                 RF + MABSplit                           1.534 ± 0.009                           1.29E+05 ± 4.38E+02                     0.815 ± 0.0
2nd row                                 ExtraTrees                              87.281 ± 0.082                          1.17E+07 ± 0.00E+00                     0.815 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   1.391 ± 0.004                           1.30E+05 ± 0.00E+00                     0.815 ± 0.0
3rd row                                 RP                                      79.618 ± 0.245                          1.06E+07 ± 3.01E+04                     0.815 ± 0.0
3rd row                                 RP + MABSplit                           1.429 ± 0.018                           1.18E+05 ± 3.35E+02                     0.815 ± 0.0
==============================
Table 1 (Classification): COVTYPE
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      167.836 ± 0.388                         1.86E+07 ± 0.00E+00                     0.559 ± 0.028
1st row                                 RF + MABSplit                           1.564 ± 0.012                           3.98E+04 ± 1.79E+02                     0.505 ± 0.004
2nd row                                 ExtraTrees                              167.118 ± 0.26                          1.86E+07 ± 0.00E+00                     0.539 ± 0.022
2nd row                                 ExtraTrees + MABSplit                   1.881 ± 0.409                           1.06E+05 ± 3.88E+04                     0.5 ± 0.005
3rd row                                 RP                                      145.539 ± 0.866                         1.62E+07 ± 8.31E+04                     0.51 ± 0.008
3rd row                                 RP + MABSplit                           1.418 ± 0.018                           3.50E+04 ± 0.00E+00                     0.507 ± 0.005




==============================
Table 2 (Regression): SKLEARN_REGRESSION
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      6.698 ± 0.145                           4.00E+07 ± 0.00E+00                     5524.814 ± 28.441
1st row                                 RF + MABSplit                           2.714 ± 0.287                           2.50E+05 ± 0.00E+00                     5524.814 ± 28.441
2nd row                                 ExtraTrees                              6.896 ± 0.024                           4.00E+07 ± 0.00E+00                     5087.722 ± 18.239
2nd row                                 ExtraTrees + MABSplit                   1.82 ± 0.031                            2.51E+05 ± 7.16E+02                     5097.103 ± 30.911
3rd row                                 RP                                      5.485 ± 0.039                           3.36E+07 ± 0.00E+00                     5399.368 ± 60.309
3rd row                                 RP + MABSplit                           1.977 ± 0.058                           2.14E+05 ± 3.51E+03                     5399.368 ± 60.309
==============================
Table 2 (Regression): AIR
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      42.441 ± 0.756                          2.59E+08 ± 4.47E+05                     2130.908 ± 1.637
1st row                                 RF + MABSplit                           20.49 ± 0.798                           6.85E+05 ± 1.22E+03                     2021.828 ± 26.928
2nd row                                 ExtraTrees                              40.015 ± 0.147                          2.67E+08 ± 0.00E+00                     1929.588 ± 22.164
2nd row                                 ExtraTrees + MABSplit                   15.945 ± 0.076                          7.18E+05 ± 6.11E+03                     1917.062 ± 14.48
3rd row                                 RP                                      33.075 ± 0.148                          2.17E+08 ± 3.71E+05                     2083.698 ± 22.226
3rd row                                 RP + MABSplit                           14.496 ± 0.05                           5.83E+05 ± 3.85E+03                     2069.932 ± 27.657
==============================
Table 2 (Regression): GPU
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      39.572 ± 0.145                          1.78E+08 ± 2.30E+04                     69733.002 ± 57.401
1st row                                 RF + MABSplit                           25.599 ± 0.288                          4.30E+06 ± 4.03E+04                     69493.921 ± 73.133
2nd row                                 ExtraTrees                              40.529 ± 0.21                           1.80E+08 ± 0.00E+00                     69734.948 ± 54.876
2nd row                                 ExtraTrees + MABSplit                   25.225 ± 0.274                          4.63E+06 ± 4.07E+04                     69585.029 ± 80.281
3rd row                                 RP                                      31.502 ± 0.208                          1.50E+08 ± 5.55E+04                     66364.998 ± 894.568
3rd row                                 RP + MABSplit                           27.492 ± 1.571                          5.23E+06 ± 3.92E+05                     66310.138 ± 896.237




==============================
Table 3 (Classification): MNIST
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.2 ± 0.179                             0.143 ± 0.026
1st row                                 RF + MABSplit                           15.8 ± 0.179                            0.83 ± 0.002
2nd row                                 ExtraTrees                              0.2 ± 0.179                             0.144 ± 0.027
2nd row                                 ExtraTrees + MABSplit                   12.0 ± 0.0                              0.814 ± 0.001
3rd row                                 RP                                      1.0 ± 0.0                               0.253 ± 0.003
3rd row                                 RP + MABSplit                           16.8 ± 0.179                            0.832 ± 0.002
==============================
Table 3 (Classification): APS
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      1.0 ± 0.0                               0.985 ± 0.0
1st row                                 RF + MABSplit                           5.8 ± 0.179                             0.989 ± 0.0
2nd row                                 ExtraTrees                              1.0 ± 0.0                               0.985 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   5.6 ± 0.219                             0.989 ± 0.0
3rd row                                 RP                                      1.0 ± 0.0                               0.985 ± 0.0
3rd row                                 RP + MABSplit                           6.8 ± 0.179                             0.989 ± 0.0
==============================
Table 3 (Classification): FLIGHT
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.2 ± 0.179                             0.815 ± 0.0
1st row                                 RF + MABSplit                           14.6 ± 0.219                            0.815 ± 0.0
2nd row                                 ExtraTrees                              0.0 ± 0.0                               0.815 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   9.6 ± 0.219                             0.815 ± 0.0
3rd row                                 RP                                      0.0 ± 0.0                               0.815 ± 0.0
3rd row                                 RP + MABSplit                           16.2 ± 0.593                            0.815 ± 0.0
==============================
Table 3 (Classification): COVTYPE
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.4 ± 0.219                             0.514 ± 0.019
1st row                                 RF + MABSplit                           99.8 ± 0.179                            0.675 ± 0.002
2nd row                                 ExtraTrees                              0.4 ± 0.219                             0.501 ± 0.007
2nd row                                 ExtraTrees + MABSplit                   29.6 ± 1.824                            0.676 ± 0.002
3rd row                                 RP                                      0.6 ± 0.219                             0.534 ± 0.03
3rd row                                 RP + MABSplit                           100.0 ± 0.0                             0.675 ± 0.002




==============================
Table 4 (Regression):SKLEARN_REGRESSION
                                        Model                                   Number of Trees                         Test MSE
1st row                                 RF                                      1.0 ± 0.0                               2479.698 ± 52.644
1st row                                 RF + MABSplit                           18.0 ± 0.0                              729.302 ± 13.139
2nd row                                 RP                                      1.0 ± 0.0                               2140.669 ± 260.937
2nd row                                 RP + MABSplit                           9.8 ± 0.716                             1005.343 ± 89.86
3rd row                                 ExtraTrees                              0.6 ± 0.219                             5677.874 ± 1611.928
3rd row                                 ExtraTrees + MABSplit                   18.0 ± 0.0                              689.331 ± 5.093
==============================
Table 4 (Regression):AIR
                                        Model                                   Number of Trees                         Test MSE
1st row                                 RF                                      0.0 ± 0.0                               3208.93 ± 0.0
1st row                                 RF + MABSplit                           14.0 ± 0.0                              886.386 ± 4.21
2nd row                                 RP                                      0.0 ± 0.0                               3208.93 ± 0.0
2nd row                                 RP + MABSplit                           12.4 ± 0.358                            863.118 ± 5.501
3rd row                                 ExtraTrees                              0.0 ± 0.0                               3208.93 ± 0.0
3rd row                                 ExtraTrees + MABSplit                   10.4 ± 0.219                            834.439 ± 4.363




==============================
Table 5 Stability Model (Budget: Q * 100000)
                                        Importance Model                        Dataset                                 Stability
1st row                                 HRFC+MID                                Random Classification                   0.536 ± 0.039
1st row                                 HRFC+MID + MABSplit                     Random Classification                   0.863 ± 0.016
2nd row                                 HRFR+MID                                Random Regression                       0.134 ± 0.021
2nd row                                 HRFR+MID + MABSplit                     Random Regression                       0.674 ± 0.043
3rd row                                 HRFC+Perm                               Random Classification                   0.579 ± 0.023
3rd row                                 HRFC+Perm + MABSplit                    Random Classification                   0.69 ± 0.023
4th row                                 HRFR+Perm                               Random Regression                       0.116 ± 0.017
4th row                                 HRFR+Perm + MABSplit                    Random Regression                       0.437 ± 0.044




==============================
Table 6 Compare our model vs sklearn
                                        Model                                   Task and Dataset                        Performance Metric                      Test Performance
1st row                                 RFC(Sklearn)                            Classification: 20 Newsgroups           Accuracy                                0.869 ± 0.01
1st row                                 RFC(Ours)                               Classification: 20 Newsgroups           Accuracy                                0.866 ± 0.015
2nd row                                 ERFC(Sklearn)                           Classification: 20 Newsgroups           Accuracy                                0.758 ± 0.038
2nd row                                 ERFC(Ours)                              Classification: 20 Newsgroups           Accuracy                                0.761 ± 0.04
3rd row                                 RFR(Sklearn)                            Regression: California Housing          MSE                                     0.322 ± 0.009
3rd row                                 RFR(Ours)                               Regression: California Housing          MSE                                     0.324 ± 0.008
4th row                                 ERFR(Sklearn)                           Regression: California Housing          MSE                                     0.612 ± 0.023
4th row                                 ERFR(Ours)                              Regression: California Housing          MSE                                     0.615 ± 0.023

Figure_1
Figure_2

What are fixed:

  1. When selecting features, randomly choose features that have the identical feature importance scores
  2. Change the hyperparameters of the feature selection stability experiment(table 5) and budget experiments(table 2 and table 4) to strengthen the experimental results. The original codes in the dataset_experiments branch failed to show the satisfying experiment result for the table 5.
  3. Uncomment the comments that block one line reproduce script.

motiwari and others added 11 commits July 26, 2022 23:13
- Find the best hyperparams setting for table 5 experiment.
…from 2 to 5

2. Implement a function that converts a target vector in classification problem to a vector of contiguous integers

3. Update requirements.txt

4. Fix bugs that the codes don't log the files in the right place depending on where we run the codes
Copy link
Copy Markdown
Contributor

@motiwari motiwari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passes sniff tests. #yolo

@motiwari motiwari merged commit be8e2c7 into dataset_experiments Sep 18, 2022
@motiwari motiwari deleted the reproduce_experiments branch September 18, 2022 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants