Title: qdrant-client: set_model attempts network connection despite HF_HUB_OFFLINE=1 and local cache
Current Behavior
When the HF_HUB_OFFLINE=1 environment variable is set, QdrantClient.set_model() still attempts to download the embedding model from Hugging Face. This fails in an offline environment, even when the model is already present in the local cache, preventing the client from initializing.
The logs paradoxically show fastembed reporting "offline mode is enabled" as the reason for a network connection failure, indicating that while the flag is recognized, the connection attempt is not being properly suppressed.
Steps to Reproduce
-
Set up an environment with no internet access (e.g., a firewalled server or a Docker container).
-
Set the environment variable: export HF_HUB_OFFLINE=1.
-
Pre-download the embedding model into the specified cache directory (/app/.cache/fastembed).
-
Confirm the model files are present in the cache. The directory structure and size should be verified:
$ du -h -d 3 /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/
241M /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/blobs
4.0K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/refs
20K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots/faf4aa4225822f3bc6376869cb1164e8e3feedd0
20K /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/snapshots
241M /app/.cache/fastembed/models--qdrant--paraphrase-multilingual-MiniLM-L12-v2-onnx-Q/
-
Run the following code:
import os
from qdrant_client import QdrantClient
QDRANT_HOST = "localhost"
QDRANT_PORT = 6333
EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
CACHE_DIR = "/app/.cache/fastembed" # Example cache path
os.environ['FASTEMBED_CACHE_PATH'] = CACHE_DIR
# This step fails due to network connection attempts despite the model being cached
client = QdrantClient(host=QDRANT_HOST, port=QDRANT_PORT)
client.set_model(EMBEDDING_MODEL, cache_dir=CACHE_DIR)
print("Client initialized successfully.") # This line is never reached
-
Observe the error logs showing repeated attempts to connect to huggingface.co.
Relevant Log Output:
2025-10-28 13:31:18.048 | ERROR | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: Cannot reach https://huggingface.co/...: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable. Falling back to other sources.
2025-10-28 13:31:18.048 | ERROR | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
Expected Behavior
When HF_HUB_OFFLINE=1 is set, qdrant-client should first check the specified cache_dir for the model. If the model files exist locally—as confirmed above—it should load them directly without initiating any network connections. The initialization should succeed seamlessly in an air-gapped environment.
Possible Solution
The issue appears to originate in the fastembed dependency. The model management logic must be updated to prioritize checking for a local model in the cache before attempting any download logic. When HF_HUB_OFFLINE=1 is set, the network download path should be completely bypassed, and the client should rely solely on the cached files.
Context (Environment)
We are deploying an application using qdrant-client in a secured, air-gapped production environment. All dependencies and models are pre-packaged into a container image. This bug is a blocker for our deployment, as the application fails to start due to its inability to operate in a true offline mode.
- Python Version: 3.12.12
- Operating System: Linux (Docker)
- Key Environment Variables:
HF_HUB_OFFLINE=1
FASTEMBED_CACHE_PATH=/app/.cache/fastembed
Title: qdrant-client:
set_modelattempts network connection despiteHF_HUB_OFFLINE=1and local cacheCurrent Behavior
When the
HF_HUB_OFFLINE=1environment variable is set,QdrantClient.set_model()still attempts to download the embedding model from Hugging Face. This fails in an offline environment, even when the model is already present in the local cache, preventing the client from initializing.The logs paradoxically show
fastembedreporting "offline mode is enabled" as the reason for a network connection failure, indicating that while the flag is recognized, the connection attempt is not being properly suppressed.Steps to Reproduce
Set up an environment with no internet access (e.g., a firewalled server or a Docker container).
Set the environment variable:
export HF_HUB_OFFLINE=1.Pre-download the embedding model into the specified cache directory (
/app/.cache/fastembed).Confirm the model files are present in the cache. The directory structure and size should be verified:
Run the following code:
Observe the error logs showing repeated attempts to connect to
huggingface.co.Relevant Log Output:
Expected Behavior
When
HF_HUB_OFFLINE=1is set,qdrant-clientshould first check the specifiedcache_dirfor the model. If the model files exist locally—as confirmed above—it should load them directly without initiating any network connections. The initialization should succeed seamlessly in an air-gapped environment.Possible Solution
The issue appears to originate in the
fastembeddependency. The model management logic must be updated to prioritize checking for a local model in the cache before attempting any download logic. WhenHF_HUB_OFFLINE=1is set, the network download path should be completely bypassed, and the client should rely solely on the cached files.Context (Environment)
We are deploying an application using
qdrant-clientin a secured, air-gapped production environment. All dependencies and models are pre-packaged into a container image. This bug is a blocker for our deployment, as the application fails to start due to its inability to operate in a true offline mode.HF_HUB_OFFLINE=1FASTEMBED_CACHE_PATH=/app/.cache/fastembed