While trying to control for every factor, it appears difficult to get the same results when directly calling xrf::xrf() and when called via rules::rule_fit().
Example:
library(tidymodels)
library(rules)
#>
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#>
#> max_rules
library(xrf)
set.seed(1)
ex_data <- modeldata::hpc_data |> slice_sample(n = 100, by = class)
set.seed(2)
xrf_fit <- xrf(class ~ ., data = ex_data, family = "multinomial")
#> Warning in xrf(class ~ ., data = ex_data, family = "multinomial"): Detected 4
#> classes to set num_class xgb_control parameter
set.seed(2)
rules_fit <-
rule_fit() |>
set_engine("xrf", seed = 0) |>
set_mode("classification") |>
fit(class ~ ., data = ex_data)
xrf_fit$xgb |> xgboost::xgb.dump() |> head()
#> [1] "booster[0]"
#> [2] "0:[compounds<197] yes=1,no=2,missing=2"
#> [3] "1:[iterations<50] yes=3,no=4,missing=4"
#> [4] "3:[protocolM<2.00001001] yes=7,no=8,missing=8"
#> [5] "7:leaf=-0.00722891558"
#> [6] "8:leaf=0.352313161"
rules_fit$fit$xgb |> xgboost::xgb.dump() |> head()
#> [1] "booster[0]"
#> [2] "0:[compounds<197] yes=1,no=2,missing=2"
#> [3] "1:[iterations<50] yes=3,no=4,missing=4"
#> [4] "3:[protocolM<1] yes=7,no=8,missing=8"
#> [5] "7:[protocolH<1] yes=13,no=14,missing=14"
#> [6] "13:[protocolO<1] yes=23,no=24,missing=24"
Created on 2026-01-13 with reprex v2.1.1
It turns out that one possible issue is the objective function argument. xrf, for multinomial data uses a value of "multi:softmax" while tidymodels uses "multi:softprob".
For xrf::xrf(), it sets that to objective = "multi:softmax" in its internal xrf::get_xgboost_objective() function. There is no way to pass a different value.
For rules, it passes things off to parsnip::xgb_train() but explicitly sets objective = NULL so there is no way for the user to reset the objective function. For multinomial data in tidymodels, it automatically sets objective = "multi:softprob".
The use of parsnip::xgb_fit() (instead of just passing everything to xrf::xrf()) came about in #60.
One possible short-term solution is to enable passing the objective function to parsnip::xgb_train(). We'll need to ensure that it doesn't harm the early stopping feature implemented in #60.
We also might want to enable xrf to be able to modify the objective function or change the default (since likelihood gradient boosting is the norm for these models).
While trying to control for every factor, it appears difficult to get the same results when directly calling
xrf::xrf()and when called viarules::rule_fit().Example:
Created on 2026-01-13 with reprex v2.1.1
It turns out that one possible issue is the
objectivefunction argument. xrf, for multinomial data uses a value of"multi:softmax"while tidymodels uses"multi:softprob".For
xrf::xrf(), it sets that toobjective = "multi:softmax"in its internalxrf::get_xgboost_objective()function. There is no way to pass a different value.For rules, it passes things off to
parsnip::xgb_train()but explicitly setsobjective = NULLso there is no way for the user to reset the objective function. For multinomial data in tidymodels, it automatically setsobjective = "multi:softprob".The use of
parsnip::xgb_fit()(instead of just passing everything toxrf::xrf()) came about in #60.One possible short-term solution is to enable passing the objective function to
parsnip::xgb_train(). We'll need to ensure that it doesn't harm the early stopping feature implemented in #60.We also might want to enable xrf to be able to modify the objective function or change the default (since likelihood gradient boosting is the norm for these models).