[Bug]: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download

## What happened?

When `HF_HUB_OFFLINE=1` is set, `download_model()` does not properly use the local HuggingFace cache. Instead, the `model_info()` API call fails immediately with an `EnvironmentError` (offline mode), and fastembed falls back to downloading ~83MB from `storage.googleapis.com`. In air-gapped environments where Google Cloud Storage is also unreachable, fastembed cannot load models that are already present in the local cache.

Related: #565

## What is the expected behaviour?

With `HF_HUB_OFFLINE=1`, fastembed should resolve models from the local HF cache (`models--org--name/snapshots/...`) without any network calls.

## Actual behavior

1. `download_model()` calls `download_files_from_huggingface()` with `local_files_only=False`
2. Inside, `model_info(hf_source_repo)` is called — this is a network API call
3. With `HF_HUB_OFFLINE=1`, `huggingface_hub` raises `EnvironmentError: offline mode is enabled`
4. fastembed catches this and logs `"Could not download model from HuggingFace... Falling back to other sources."`
5. Falls back to `retrieve_model_gcs()` → downloads ~83MB from `storage.googleapis.com`
6. In air-gapped environments, GCS is also unreachable → **complete failure**

Note: on current `main` (v0.7.x), there is a `local_files_only=True` first pass before the retry loop. However, if that pass fails for any reason (e.g. missing metadata file), the retry loop still hits the network path described above.

## A minimal reproducible example

```python
from fastembed import TextEmbedding
import os

# Step 1: Download the model (populates HF cache)
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")

# Step 2: Enable offline mode
os.environ["HF_HUB_OFFLINE"] = "1"

# Step 3: Try to load the same model — triggers GCS download instead of using local cache
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")
```

## What Python version are you on?

Python 3.12 (pip)

## FastEmbed version

- 0.6.0 (pinned in container image, confirmed affected)
- 0.7.4 (current main, confirmed affected)

## What os are you seeing the problem on?

Linux (Red Hat UBI 9, running in OpenShift containers)

## Relevant stack traces and/or logs

```
2026-03-16 04:47:03.565 | ERROR | fastembed.common.model_management:download_model:429 - Could not download model from HuggingFace: Cannot reach https://artifactory.example.com/api/models/qdrant/all-MiniLM-L6-v2-onnx: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable. Falling back to other sources.
  0%|          | 0.00/83.2M [00:00<?, ?iB/s]  5%|▌         | 4.33M/83.2M [00:00<00:01, 43.2MiB/s] ...
100%|██████████| 83.2M/83.2M [00:00<00:00, 106MiB/s]
```

## Fix

PR #614

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download #615

What happened?

What is the expected behaviour?

Actual behavior

A minimal reproducible example

What Python version are you on?

FastEmbed version

What os are you seeing the problem on?

Relevant stack traces and/or logs

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download #615

Description

What happened?

What is the expected behaviour?

Actual behavior

A minimal reproducible example

What Python version are you on?

FastEmbed version

What os are you seeing the problem on?

Relevant stack traces and/or logs

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions