[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems

## Problem

Gradient-boosted decision trees (GBDTs) are commonly used in the industry as part of the scoring phase of recommender systems. Supporting serving of these models and integrating with the Merlin ecosystem will help facilitate usage of these models in these systems.

The Triton Inference Server has a [backend called FIL (Forest Inference Library)](https://github.com/triton-inference-server/fil_backend) to facilitate GPU accelerated serving of these models.

> Random forests (RF) and gradient-boosted decision trees (GBDTs) have become workhorse models of applied machine learning. XGBoost and LightGBM, popular packages implementing GBDT models, consistently rank among the most commonly used tools by data scientists on the Kaggle platform. We see similar interest in forest-based models in industry, where they are applied to problems ranging from inventory forecasting, to ad ranking, to medical diagnostics.
>
> [RAPIDS Forest Inference Library: Prediction at 100 million rows per second](https://medium.com/rapids-ai/rapids-forest-inference-library-prediction-at-100-million-rows-per-second-19558890bc35)

## Goals

- Enable the use of Tree based models (e.g. GBDTs, Random Forests) in a Merlin Systems ensemble.
- Support the training of XGBoost models from a Merlin Dataset.

## Constraints

- Use the Triton [FIL backend](https://github.com/triton-inference-server/fil_backend) in Merlin systems

## Starting Point

### Merlin-models (Data Scientist)

- [x] https://github.com/NVIDIA-Merlin/models/issues/132
  - First version of API
     - https://github.com/NVIDIA-Merlin/models/pull/433 
     - https://github.com/NVIDIA-Merlin/models/pull/466
  - Example
     - https://github.com/NVIDIA-Merlin/models/pull/522

### NVTabular (Data Scientist)

- [NA] Operators for batch prediction with these models 
- Note: Batch prediction is not in scope for this development

### Merlin-systems (Product Engineer)

- [X] FIL integration in ensemble API.
  Supports the following models: XGBoost, LightGBM, Scikit-Learn (Random Forest), cuML (Random Forest)
  - https://github.com/NVIDIA-Merlin/systems/pull/118
  - https://github.com/NVIDIA-Merlin/systems/pull/110


### Examples and Docs (Everyone)
- [x] Example of training  tree models ( End to End example )
- [ ] NVIDIA-Merlin/models#575


Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-828

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems #105

Problem

Goals

Constraints

Starting Point

Merlin-models (Data Scientist)

NVTabular (Data Scientist)

Merlin-systems (Product Engineer)

Examples and Docs (Everyone)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems #105

Description

Problem

Goals

Constraints

Starting Point

Merlin-models (Data Scientist)

NVTabular (Data Scientist)

Merlin-systems (Product Engineer)

Examples and Docs (Everyone)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions