[Feature]: Berkeley Function-Calling Leaderboard benchmarks

### Is your feature request related to a problem?

_No response_

### Describe the solution you'd like

Berkeley BFCL eval is focused on multiple tool calling scenarios, measuring the model's ability to properly handle and invoke tools.
 this will be a good addition https://github.com/ShishirPatil/gorilla 

### What are you requesting?

New benchmark/evaluation

### Describe alternatives you've considered

_No response_

### Use case

Testing BFCL  live and non-live, datasets 
https://gorilla.cs.berkeley.edu/leaderboard.html

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Berkeley Function-Calling Leaderboard benchmarks #299

Is your feature request related to a problem?

Describe the solution you'd like

What are you requesting?

Describe alternatives you've considered

Use case

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Berkeley Function-Calling Leaderboard benchmarks #299

Description

Is your feature request related to a problem?

Describe the solution you'd like

What are you requesting?

Describe alternatives you've considered

Use case

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions