Skip to content

Comments

feat: collect total token usage#32

Merged
lifeizhou-ap merged 8 commits intomainfrom
lifei/collect_total_token_usage
Sep 19, 2024
Merged

feat: collect total token usage#32
lifeizhou-ap merged 8 commits intomainfrom
lifei/collect_total_token_usage

Conversation

@lifeizhou-ap
Copy link
Collaborator

@lifeizhou-ap lifeizhou-ap commented Sep 6, 2024

Why
User would like to know the token usage and cost involved while using LLM in goose

What

  • Created _TokenUsageCollector to collect the usage data from http call to LLM apis including the model name
  • Exposed get_token_usage on exchange for goose to get the total token usage (raised discussion in the comments here)

@lifeizhou-ap lifeizhou-ap changed the title Lifei/collect total token usage feat: collect total token usage Sep 6, 2024
@lukealvoeiro
Copy link
Collaborator

@lifeizhou-ap I don't think we need to collect it as the total token count is already stored in the CheckpointData

@zakiali
Copy link
Collaborator

zakiali commented Sep 7, 2024

+1 to @lukealvoeiro's comment, we already have this, but turning this into total tokens sent (across all requests to the LLM) would be usefule for cost tracking. Some providers like openai also return back how much was spent on each request. Obviously this changes for each LLM, and we could have an internal look up for it, but then have to keep it up to date. Maybe we just provide total tokens to user and $ amount if available.

@lifeizhou-ap
Copy link
Collaborator Author

lifeizhou-ap commented Sep 7, 2024

Thanks for the suggestions and early feedback!

Firstly I would like to clarify the requirements about tracking the cost/tokens.

  1. My understanding is: we would like to track the cost/tokens that the user used after the user starts the goose console. The LLM provider api will return the tokens that used in the api call if they have, and we want to track all the token counts that are used in each api call after the user starts the goose console. (At the moment it is tracked in memory. Later on we could persist it to show usage per session)

  2. In the current implementation, we exchange function delegates the provider complete function to interact with the LLM providers and the complete function returns the total number of the token used. The return value is the data source that will be used to calculate the cost. Currently api such as open ai does not return cost. So we can use the model and total token count data from the api call to calculate the cost.

Please correct me if any of above points is incorrect. This is to make sure that I am on the same page with you on the requirements before the implementation discussion below.

I've looked into Checkpoint before I started this PR change. Here is my understanding on CheckpointData (containing a list of checkpoints) :

  • Each exchange instance has CheckpointData. It is used to track the token count which has been used in sync with the messages in the exchange` instance.

  • However, it does not stand for the tokens have been used since exchange has been initialised. By reading the code, currently the checkpointData's main purpose is to monitor the token usage (in sync with messages). We can use it when we sending the next api call to avoid sending too many tokens. If the tokens exceed the limit, the messages will be truncated by FIFO logic. The total_token_count is reduced based on the message truncated.

Here is my thoughts:

Based on my understanding of the requirements, we want to track the token usage from the api calls delegated by all exchange instances. We have different places in the code to initiate the exchanges. Instead of gathering/managing all the exchanges to sum up the total token count in each instances, it would be good to plugin/inject the token_usage_collector (we can use it track more data in the future if necessary) to exchange and let the collector to collect the usages. The approach is similar to logging.

In addition, the usage data that token_usage_collector collects is more accurate. The CheckpointData total token count only has the count which in sync with the current messages in the instance.

With this approach, the token_usage_collector and checkpointData have their own single responsibility without coupling.

@codefromthecrypt
Copy link
Contributor

if this is about cost, and we are talking about billing platforms vs local like ollama, wouldn't we need to split between input_tokens and output_tokens (total_tokens is mixed)?

My thinking is if we turn this into a feature request first (cost accounting? or preventing overflow), we can figure out how to solve it, and also decouple from however these counts are processed.

For example, I work on opentelemetry and it collects input and output tokens, but this is not processed inline in the code, rather sent as trace spans or metrics. In this way you don't need to change the application logic depending on what you are looking at. OTOH, this approach (instrumentation) doesn't allow you to control future requests based on that data.

TL;DR; can we turn this into a request for a feature? then, it would be easier to decide how to proceed in code.

* main:
  feat: Rework error handling (#48)
  chore(release): release version 0.9.0 (#45)
  chore: add just command for releases and update pyproject for changelog (#43)
  feat: convert ollama provider to an openai configuration (#34)
  fix: Bedrock Provider request (#29)
  test: Update truncate and summarize tests to check for sytem prompt t… (#42)
  chore: update test_tools to read a file instead of get a password (#38)
  fix: Use placeholder message to check tokens (#41)
  feat: rewind to user message (#30)
  chore: Update LICENSE (#40)
  fix: shouldn't hardcode truncate to gpt4o mini (#35)
@lifeizhou-ap
Copy link
Collaborator Author

lifeizhou-ap commented Sep 18, 2024

@baxen and I had a discussion about the implementation of this feature

Summary:

  • the exchange constructor should not have dependency on _TokenUsageCollector
  • We also looked into the approach to use Moderator to collect the token usage. It was supposed to use usage on the checkpoint on the moderator, but the usage does not stand for the total usage. more details are in my comments above.

Conclusion:

  • we decided to set _TokenUsageCollector as a global singleton to collect the token usage.
  • save the usage in the log file
  • At the time, we use _TokenUsageCollector to collect the usage as a simple solution. In the long term, we can parse the the open telemetry instrumentation data if we would like to track more data from the http call response payload

Hi @baxen Please correct me if anything above is incorrect. Thanks!

@lifeizhou-ap
Copy link
Collaborator Author

lifeizhou-ap commented Sep 19, 2024

Hey @baxen

I have question about the log the usage. We could either log them in either exchange or goose project

  1. If we trigger logging from exchange

    • we will log the usage every api call in a fine-grained level.
    • we have to specify the location of log file in exchange or pass the log directory from goose to exchange
  2. If we trigger logging from goose

    • We can log the total usage when session is saved. this will be easier to calculate the total cost and I guess the total cost might be useful to the user instead of the fine grained data.
    • We can save the log file in the goose config directory as a single point

I am leaning to trigger logging from goose. (Draft PR in goose) WDYT?

Copy link
Collaborator

@baxen baxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I do think we should switch away from the queue before merging for simplicity

self.usage_data_queue = queue.Queue()

def collect(self, model: str, usage: Usage) -> None:
self.usage_data_queue.put((model, usage.input_tokens, usage.output_tokens))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need a queue here (I don't see this making using of put/task_done etc in the sense of processing a queue in threads). If we're appending to a list (or adding to a running sum in a dictionary) the GIL will make sure all updates are processed, and I don't believe we have any concerns around ordering



@dataclass
class TokenUsage:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd avoid a new object here and instead make a map of the existing Usage object stored in a dict with model as the key?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can do

@baxen
Copy link
Collaborator

baxen commented Sep 19, 2024

Hey @baxen

I have question about the log the usage. We could either log them in either exchange or goose project

  1. If we trigger logging from exchange

    • we will log the usage every api call in a fine-grained level.
    • we have to specify the location of log file in exchange or pass the log directory from goose to exchange
  2. If we trigger logging from goose

    • We can log the total usage when session is saved. this will be easier to calculate the total cost and I guess the total cost might be useful to the user instead of the fine grained data.
    • We can save the log file in the goose config directory as a single point

I am leaning to trigger logging from goose. (Draft PR in goose) WDYT?

Yes agree! Definitely should be handling setting logging verbosity/etc downstream in the application. If it's convenient, i do think it's fully reasonable to log at e.g. debug level here in exchange, but I don't see any need to add that at this point.

@lifeizhou-ap lifeizhou-ap marked this pull request as ready for review September 19, 2024 22:43
@lifeizhou-ap lifeizhou-ap merged commit 8139c74 into main Sep 19, 2024
@lifeizhou-ap lifeizhou-ap deleted the lifei/collect_total_token_usage branch September 19, 2024 22:44
codefromthecrypt pushed a commit to codefromthecrypt/exchange that referenced this pull request Oct 13, 2024
* adding in ability to provide per repo hints

* tidy up test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants