Improve usability of --model-url & related flags#6930
Improve usability of --model-url & related flags#6930ochafik merged 13 commits intoggml-org:masterfrom
Conversation
…file (or else legacy models/7B/ggml-model-f16.gguf)
|
Great, please mind it will force people to download again files from the remote URL, kind of a breaking change. |
@phymbert ah I forgot, indeed! I initially planned on being backwards compatible (I felt it had low long-term usefulness but happy to add this code back, it's just a few lines) but I thought it's easier to provide a code snippet for people to create the JSON file out of the etag & lastModified. Added this as TODO before I undraft this PR. (also technically even without a migration snippet people can just use |
| n_items - strlen(last_modified_prefix) - 2); // Remove CRLF | ||
| std::string header(buffer, n_items); | ||
| std::smatch match; | ||
| if (std::regex_match(header, match, std::regex("([^:]+): (.*)\r\n", std::regex_constants::multiline))) { |
There was a problem hiding this comment.
the regex will be compiled at each header ? do we really need a regex to parse http headers ?
There was a problem hiding this comment.
do we really need a regex to parse http headers ?
Agree std::regex may seem overkill but it's simpler & safer than C string manipulations.
In the previous code for instance, just realized there's at least one buffer overflow bug (cc/ @ggerganov FYI): this strncpy will write beyond the stack-allocated etag's LLAMA_CURL_MAX_HEADER_LENGTH (=256) bytes and into stack-allocated etag_path if the ETag header value length is > 256 bytes, possibly giving HuggingFace (or anyone else you download from) write access to the system (a fix would be to turn the last arg of strncpy to MIN(sizeof(etag) - 1, n_items - strlen(etag_prefix) - 2), but it would make the code a bit harder to read & maintain).
the regex will be compiled at each header ?
Good point! I had opted to be slightly wasteful in CPU cycles here as the lifecyle management of regexes is trickier in the C callback context (e.g. can't allocate outside & pass through lambda capture), and the easiest alternatives (static alloc inside the callback or globally) were a bit wasteful in memory as these regex aren't useful afterwards. Let's go for potential shorter startup time? (now using local static allocs).
There was a problem hiding this comment.
You could always pass a pointer to the compiled regex in the userdata, however this is not performance sensitive code at all, and it's preferable to keep the code simple and easier to maintain.
| Given a server listening on localhost:8080 | ||
| And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf | ||
| And a model file ggml-model-f16.gguf | ||
| And a model file bert-bge-small.gguf |
There was a problem hiding this comment.
Could you please explain why this change is now necessary?
There was a problem hiding this comment.
Sorry I missed this! I think there was another test case that was implicitly downloading another URL to ggml-model-f16.gguf, causing a collision
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf) * args: main & server now call gpt_params_handle_model_default * args: define DEFAULT_MODEL_PATH + update cli docs * curl: check url of previous download (.json metadata w/ url, etag & lastModified) * args: fix update to quantize-stats.cpp * curl: support legacy .etag / .lastModified companion files * curl: rm legacy .etag file support * curl: reuse regex across headers callback calls * curl: unique_ptr to manage lifecycle of curl & outfile * curl: nit: no need for multiline regex flag * curl: update failed test (model file collision) + gitignore *.gguf.json
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf) * args: main & server now call gpt_params_handle_model_default * args: define DEFAULT_MODEL_PATH + update cli docs * curl: check url of previous download (.json metadata w/ url, etag & lastModified) * args: fix update to quantize-stats.cpp * curl: support legacy .etag / .lastModified companion files * curl: rm legacy .etag file support * curl: reuse regex across headers callback calls * curl: unique_ptr to manage lifecycle of curl & outfile * curl: nit: no need for multiline regex flag * curl: update failed test (model file collision) + gitignore *.gguf.json




Fixes #6887
--modelis now inferred asmodels/$filenamewith the filename from--model-url/-muor--hf-file/-hffif set (it still defaults tomodels/7B/gguf-model-f16.ggufotherwise). Downloading different URLs will no longer overwrite previous downloads.URL model download now write a
.jsoncompanion metadata file (instead of the previous separate.etag&.lastModifiedfiles). This also contains the URL itself, which is useful to remember the exact origin of models & prevents accidental overwrites of files.Note: This is a breaking change wrt/ already downloaded models as
.etagand.lastModifiedfiles are now obsolete. If you're used to typing the following:Then you can avoid a re-download by migrating your
.etag&.lastModifiedfiles to a.jsonfile using a simple Python snippetShow `migrate_etag.py` & its usage
Smaller changes:
--hf-fileto--modelonserver(as was done onmain)TODO: