base_score default

At least in the R package, it seems that the base_score defaults to 0.5.  For a classification problem, this seems to be a reasonable choice.  However, this strikes me as a bizarre choice for a regression problem.  Would it make more sense to set base_score=mean(label) by default?  This choice might even be an improvement over the default in both regression and classification settings.

If the learning rate/shrinkage were set to 1, then the choice of base_score would be irrelevant as the resulting model would just compensate for any changes in the mean.  However, essentially all reasonable implementations of boosted trees use learning rate much smaller than 1.  Thus, in principle, it would seem that the choice of base_score would affect the final model.

I understand that it is extremely straightforward to set the base_score manually, but if there is a way to pick a better default, it may be worth implementing.

Any thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

base_score default #799

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

base_score default #799

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions