Adding extra values to the decode_result enum#1140
Adding extra values to the decode_result enum#1140martindevans merged 1 commit intoSciSharp:masterfrom
Conversation
|
Thanks for this. It looks like the doc comments in the header with the errors codes wasn't updated, so I missed these changes when upgrading. Do you have any idea how to handle these new return values? If possible I'd like to add something to the doc comments indicating what should be done when you receive this result. I just had a quick dig through llama.cpp and I can't see anything explaining it, so if you don't know either I'm happy to merge as-is. If you're interesting in contributing upstream, it would be great to submit a PR upstream which changes one of the two |
|
So I'm treating -3 with the ubatch like a NoKvSlot error in my code for batched execution, which is based heavily on some suggestions you've made to me on handling out of memory which was working great. I'm more or less just going wild adding conversations until I hit a KvSlot error, then rolling that last one back and requeuing it for later, treating that as a theoretical maximum rather than hardcoding a batch size. The issue was with this upgrade I was only checking for three paths - Error, NoKvSlot and Ok. The -3 left me in an endless loop of calling Infer and failing with the finding a slot for a new batch. Treating this the same, rollback and continue, got me moving forward. I obviously probably want to check my settings around slots better, but looks like that oversight at least found this. I'm just playing around right now, but here's my code with the current "solution" until I get the explicit enums to check |
There was a problem hiding this comment.
Pull Request Overview
This PR expands the DecodeResult enum to better capture different failure modes encountered during batch operations and decoding. It introduces new values for compute abortion, memory allocation failures, and general decode failures.
- Introduces ComputeAborted for canceled computations.
- Adds AllocationFailed to denote memory-related issues.
- Adds DecodeFailed to cover general decode errors.
Comments suppressed due to low confidence (2)
LLama/Native/DecodeResult.cs:26
- [nitpick] The new enum value 'ComputeAborted' is assigned a positive value while other error indicators use negatives; consider using a negative value or renaming it (e.g., 'OperationAborted') to clearly denote an error state.
ComputeAborted = 2,
LLama/Native/DecodeResult.cs:36
- [nitpick] The name 'DecodeFailed' is quite general; if possible, consider clarifying the context of the failure to help differentiate it from other potential decode issues.
DecodeFailed = -3,
|
Oops, I didn't mean to request that review from Copilot. That's a pretty cool new feature though :O |
I started getting -3 values when doing batch operations upon calling
Inferon the batched executor. I was checking for NoKvSlot, but it turns out-3aligns to "failed to prepare ubatch" when callingfind_slot. I believe these values first appeared with this commit two weeks ago, but that's with, admittedly, limited research.I'd like a better enum value, but unfortunately, they also are using -3 for
GGML_STATUS_FAILEDresults when callinggraph_compute, so the value has a double meaning. So, I'm definitely open to other names for the value than what I have here, but at a loss at what they could be.