Skip to content

[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems #105

@viswa-nvidia

Description

@viswa-nvidia

Problem

Gradient-boosted decision trees (GBDTs) are commonly used in the industry as part of the scoring phase of recommender systems. Supporting serving of these models and integrating with the Merlin ecosystem will help facilitate usage of these models in these systems.

The Triton Inference Server has a backend called FIL (Forest Inference Library) to facilitate GPU accelerated serving of these models.

Random forests (RF) and gradient-boosted decision trees (GBDTs) have become workhorse models of applied machine learning. XGBoost and LightGBM, popular packages implementing GBDT models, consistently rank among the most commonly used tools by data scientists on the Kaggle platform. We see similar interest in forest-based models in industry, where they are applied to problems ranging from inventory forecasting, to ad ranking, to medical diagnostics.

RAPIDS Forest Inference Library: Prediction at 100 million rows per second

Goals

  • Enable the use of Tree based models (e.g. GBDTs, Random Forests) in a Merlin Systems ensemble.
  • Support the training of XGBoost models from a Merlin Dataset.

Constraints

Starting Point

Merlin-models (Data Scientist)

NVTabular (Data Scientist)

  • [NA] Operators for batch prediction with these models
  • Note: Batch prediction is not in scope for this development

Merlin-systems (Product Engineer)

Examples and Docs (Everyone)

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-828

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions