different tree results from xrf::xrf()

While trying to control for every factor, it appears difficult to get the same results when directly calling `xrf::xrf()` and when called via `rules::rule_fit()`. 

Example: 

``` r
library(tidymodels)
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules
library(xrf)

set.seed(1)
ex_data <- modeldata::hpc_data |> slice_sample(n = 100, by = class)

set.seed(2)
xrf_fit <- xrf(class ~ ., data = ex_data, family = "multinomial")
#> Warning in xrf(class ~ ., data = ex_data, family = "multinomial"): Detected 4
#> classes to set num_class xgb_control parameter

set.seed(2)
rules_fit <- 
  rule_fit() |>
  set_engine("xrf", seed = 0) |>
  set_mode("classification") |> 
  fit(class ~ ., data = ex_data)

xrf_fit$xgb |> xgboost::xgb.dump() |> head()
#> [1] "booster[0]"                                   
#> [2] "0:[compounds<197] yes=1,no=2,missing=2"       
#> [3] "1:[iterations<50] yes=3,no=4,missing=4"       
#> [4] "3:[protocolM<2.00001001] yes=7,no=8,missing=8"
#> [5] "7:leaf=-0.00722891558"                        
#> [6] "8:leaf=0.352313161"
rules_fit$fit$xgb |> xgboost::xgb.dump() |> head()
#> [1] "booster[0]"                              
#> [2] "0:[compounds<197] yes=1,no=2,missing=2"  
#> [3] "1:[iterations<50] yes=3,no=4,missing=4"  
#> [4] "3:[protocolM<1] yes=7,no=8,missing=8"    
#> [5] "7:[protocolH<1] yes=13,no=14,missing=14" 
#> [6] "13:[protocolO<1] yes=23,no=24,missing=24"
```

<sup>Created on 2026-01-13 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

It turns out that one possible issue is the `objective` function argument. xrf, for multinomial data uses a value of `"multi:softmax"` while tidymodels uses `"multi:softprob"`. 

For `xrf::xrf()`, it sets that to `objective = "multi:softmax"` in its internal `xrf::get_xgboost_objective()` function. There is no way to pass a different value. 

For rules, it passes things off to `parsnip::xgb_train()` _but explicitly sets_ `objective = NULL` so there is no way for the user to reset the objective function. For multinomial data in tidymodels, it automatically sets `objective = "multi:softprob"`. 

The use of `parsnip::xgb_fit()` (instead of just passing everything to `xrf::xrf()`) came about in #60. 

One possible short-term solution is to enable passing the objective function to `parsnip::xgb_train()`. We'll need to ensure that it doesn't harm the early stopping feature implemented in #60. 

We also might want to enable xrf to be able to modify the objective function or change the default (since likelihood gradient boosting is the norm for these models). 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different tree results from xrf::xrf() #95

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

different tree results from xrf::xrf() #95

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions