This R-package computes the Conditional Permutation Importance (CPI; Strobl, 2008)
using an alternative implementation that is both faster and more
stable (Debeer & Strobl 2020). The (C)PI can
be computed for random forest fit using (a) the original impurity
reduction method ( randomForest-package), and (b) using the Conditional
Inference framework (party-package). In addition, a plotting method for
the resulting VarImp-object is included.
The package can be installed using using the devtools-package:
install.packages("devtools")
devtools::install_github("ddebeer/permimp")
The workhorse is the permimp-function.
?permimp
For documentation about the plotting function:
?plot.VarImp
library(party)
library(randomForest)
library(permimp)
### set seed
set.seed(542863)
### get example data
airq <- subset(airquality, !(is.na(Ozone) | is.na(Solar.R)))
### fit a random forest
### ... using the party package
cfAirq5 <- cforest(Ozone ~ ., data = airq,
control = cforest_unbiased(mtry = 3, ntree = 1000,
minbucket = 5,
minsplit = 10))
### compute the conditional permutation importance
permimp_cf <- permimp(cfAirq5, conditional = TRUE)
plot(permimp_cf, type = "box", interval = "quantile")
### fit a random forest ...
### ... using the randomForest package
rfAirq5 <- randomForest(Ozone ~ ., data = airq,
mtry = 3, ntree = 1000, importance = TRUE,
keep.forest = TRUE, keep.inbag = TRUE)
### compute the conditional permutation importance
permimp_rf <- permimp(rfAirq5, conditional = TRUE)
plot(permimp_rf, horizontal = TRUE)
For forests with large trees parallel processing may speed up the computations.
Parallel processing is possible via the cl argument. Under the hood, the
pblapply function from the pbapply-package.
Tip: when using parallel processing set progressBar = FALSE. The additional communication
between the nodes for updating the progress bar will slow down the computations.