What happened?
When HF_HUB_OFFLINE=1 is set, download_model() does not properly use the local HuggingFace cache. Instead, the model_info() API call fails immediately with an EnvironmentError (offline mode), and fastembed falls back to downloading ~83MB from storage.googleapis.com. In air-gapped environments where Google Cloud Storage is also unreachable, fastembed cannot load models that are already present in the local cache.
Related: #565
What is the expected behaviour?
With HF_HUB_OFFLINE=1, fastembed should resolve models from the local HF cache (models--org--name/snapshots/...) without any network calls.
Actual behavior
download_model() calls download_files_from_huggingface() with local_files_only=False
- Inside,
model_info(hf_source_repo) is called — this is a network API call
- With
HF_HUB_OFFLINE=1, huggingface_hub raises EnvironmentError: offline mode is enabled
- fastembed catches this and logs
"Could not download model from HuggingFace... Falling back to other sources."
- Falls back to
retrieve_model_gcs() → downloads ~83MB from storage.googleapis.com
- In air-gapped environments, GCS is also unreachable → complete failure
Note: on current main (v0.7.x), there is a local_files_only=True first pass before the retry loop. However, if that pass fails for any reason (e.g. missing metadata file), the retry loop still hits the network path described above.
A minimal reproducible example
from fastembed import TextEmbedding
import os
# Step 1: Download the model (populates HF cache)
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
# Step 2: Enable offline mode
os.environ["HF_HUB_OFFLINE"] = "1"
# Step 3: Try to load the same model — triggers GCS download instead of using local cache
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
What Python version are you on?
Python 3.12 (pip)
FastEmbed version
- 0.6.0 (pinned in container image, confirmed affected)
- 0.7.4 (current main, confirmed affected)
What os are you seeing the problem on?
Linux (Red Hat UBI 9, running in OpenShift containers)
Relevant stack traces and/or logs
2026-03-16 04:47:03.565 | ERROR | fastembed.common.model_management:download_model:429 - Could not download model from HuggingFace: Cannot reach https://artifactory.example.com/api/models/qdrant/all-MiniLM-L6-v2-onnx: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable. Falling back to other sources.
0%| | 0.00/83.2M [00:00<?, ?iB/s] 5%|▌ | 4.33M/83.2M [00:00<00:01, 43.2MiB/s] ...
100%|██████████| 83.2M/83.2M [00:00<00:00, 106MiB/s]
Fix
PR #614
What happened?
When
HF_HUB_OFFLINE=1is set,download_model()does not properly use the local HuggingFace cache. Instead, themodel_info()API call fails immediately with anEnvironmentError(offline mode), and fastembed falls back to downloading ~83MB fromstorage.googleapis.com. In air-gapped environments where Google Cloud Storage is also unreachable, fastembed cannot load models that are already present in the local cache.Related: #565
What is the expected behaviour?
With
HF_HUB_OFFLINE=1, fastembed should resolve models from the local HF cache (models--org--name/snapshots/...) without any network calls.Actual behavior
download_model()callsdownload_files_from_huggingface()withlocal_files_only=Falsemodel_info(hf_source_repo)is called — this is a network API callHF_HUB_OFFLINE=1,huggingface_hubraisesEnvironmentError: offline mode is enabled"Could not download model from HuggingFace... Falling back to other sources."retrieve_model_gcs()→ downloads ~83MB fromstorage.googleapis.comNote: on current
main(v0.7.x), there is alocal_files_only=Truefirst pass before the retry loop. However, if that pass fails for any reason (e.g. missing metadata file), the retry loop still hits the network path described above.A minimal reproducible example
What Python version are you on?
Python 3.12 (pip)
FastEmbed version
What os are you seeing the problem on?
Linux (Red Hat UBI 9, running in OpenShift containers)
Relevant stack traces and/or logs
Fix
PR #614