Benchmark 2.0

### 🔖 Enhancement description

Details to include:

- ✅ Tokens used for each question  (and average per model)
- ✅ Separate model input & output price
- ✅ Duration for each question (and average per model)
- ✅ Cost for each question  (and average per model)
- ✅ TPS of model, like price of model
- ✅ total cost (sum of all)
- ✅ Remember total tool calls done for each question, and average on model
- Structure that stores old benchmarks too, not just latest

### 🎤 Pitch

More insightful benhmark

### 👀 Have you spent some time to check if this issue has been raised before?

- [x] I checked and didn't find similar issue

### 🏢 Have you read the Code of Conduct?

- [x] I have read the [Code of Conduct](https://github.com/appwrite/.github/blob/main/CODE_OF_CONDUCT.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark 2.0 #17

🔖 Enhancement description

🎤 Pitch

👀 Have you spent some time to check if this issue has been raised before?

🏢 Have you read the Code of Conduct?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark 2.0 #17

Description

🔖 Enhancement description

🎤 Pitch

👀 Have you spent some time to check if this issue has been raised before?

🏢 Have you read the Code of Conduct?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions