Introduce whitelisted folders for external data validation by yuslepukhin · Pull Request #27352 · microsoft/onnxruntime

yuslepukhin · 2026-02-15T00:45:35Z

Description

Many customers reported that they prefer to store external data in locations other than model folder PR
Previous security change disabled that possibility. PR #26776.
This PR introduces a new API that sets whitelisted folders option. Data stored under those folders or their subfolders would still be allowed.

Motivation and Context

qdrant/fastembed#603

Copilot

Pull request overview

Adds support for loading TensorProto external data from user-configured “whitelisted” directories (in addition to the model directory), addressing the prior security hardening that restricted external data to the model folder.

Changes:

Introduces a new C/C++ SessionOptions API to configure semicolon-separated whitelisted external-data directories.
Adds parsing/validation logic for whitelist paths and extends external data path validation to allow matches under whitelisted folders.
Updates graph/session code paths and expands unit tests for whitelist behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
onnxruntime/test/ir/graph_test.cc	Updates call site for new Graph::ConvertInitializersIntoOrtValues signature.
onnxruntime/test/framework/tensorutils_test.cc	Adds extensive tests for whitelist-aware path validation and whitelist parsing.
onnxruntime/core/session/provider_bridge_ort.cc	Removes shared-provider host bridge for external data path validation.
onnxruntime/core/session/ort_apis.h	Adds C API entrypoint declaration for SessionOptionsSetWhiteListedDataFolders.
onnxruntime/core/session/onnxruntime_c_api.cc	Registers the new C API function pointer and updates version asserts (currently problematic).
onnxruntime/core/session/inference_session.cc	Parses whitelist from SessionOptions and passes it into initializer conversion.
onnxruntime/core/session/abi_session_options.cc	Implements SessionOptionsSetWhiteListedDataFolders (currently rejects nullptr).
onnxruntime/core/providers/shared_library/provider_interfaces.h	Removes ProviderHost virtual for validating external data paths.
onnxruntime/core/providers/shared_library/provider_api.h	Removes shared-provider wrapper for external data path validation.
onnxruntime/core/graph/graph.cc	Extends initializer conversion to validate external paths against whitelist.
onnxruntime/core/framework/tensorprotoutils.h	Adds ParseWhiteListedPaths and extends ValidateExternalDataPath signature.
onnxruntime/core/framework/tensorprotoutils.cc	Implements whitelist parsing and whitelist-aware external data path validation.
onnxruntime/core/framework/session_options.h	Adds SessionOptions::whitelisted_data_folders storage.
include/onnxruntime/core/session/onnxruntime_cxx_inline.h	Adds C++ wrapper SessionOptions::SetWhiteListedDataFolders implementation.
include/onnxruntime/core/session/onnxruntime_cxx_api.h	Adds C++ wrapper SessionOptions::SetWhiteListedDataFolders declaration.
include/onnxruntime/core/session/onnxruntime_c_api.h	Adds public C API declaration/docs for SessionOptionsSetWhiteListedDataFolders (currently mismatched).
include/onnxruntime/core/graph/graph.h	Changes public Graph API signature for ConvertInitializersIntoOrtValues.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/onnxruntime/core/session/onnxruntime_c_api.h

onnxruntime/core/session/abi_session_options.cc

include/onnxruntime/core/session/onnxruntime_c_api.h

include/onnxruntime/core/graph/graph.h

onnxruntime/core/session/provider_bridge_ort.cc

onnxruntime/test/framework/tensorutils_test.cc

onnxruntime/core/session/onnxruntime_c_api.cc

onnxruntime/core/providers/shared_library/provider_interfaces.h

include/onnxruntime/core/session/onnxruntime_c_api.h

xenova · 2026-02-17T16:26:57Z

I'd also like to voice that I have been encountering more errors like this with the hugging face cache system.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : External data path validation failed for initializer: vision_model.embeddings.patch_embedding.weight. Error: tensorprotoutils.cc:347 ValidateExternalDataPath External data path: "vision_encoder.onnx_data" escapes model directory: ".../.cache/huggingface/hub/models--onnx-community--granite-docling-258M-ONNX/snapshots/e8602580df77443fc3421cf3bae0601da601e5c6/onnx"

Since this is quite a popular way to use models, hopefully there is a way to fix this.

Example reproduction (which used to work correctly before that update, from https://huggingface.co/onnx-community/granite-docling-258M-ONNX):

from transformers import AutoConfig, AutoProcessor
from transformers.image_utils import load_image
from huggingface_hub import hf_hub_download
import onnxruntime
import numpy as np


# 1. Load models
## Load config and processor
model_id = "onnx-community/granite-docling-258M-ONNX"
config = AutoConfig.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

## Download models from the Hugging Face Hub
vision_model_path = hf_hub_download(model_id, subfolder="onnx", filename="vision_encoder.onnx")         # graph
hf_hub_download(model_id, subfolder="onnx", filename="vision_encoder.onnx_data")                        # weights
embed_model_path = hf_hub_download(model_id, subfolder="onnx", filename="embed_tokens.onnx")            # graph
hf_hub_download(model_id, subfolder="onnx", filename="embed_tokens.onnx_data")                          # weights
decoder_model_path = hf_hub_download(model_id, subfolder="onnx", filename="decoder_model_merged.onnx")  # graph
hf_hub_download(model_id, subfolder="onnx", filename="decoder_model_merged.onnx_data")                  # weights

## Load sessions
vision_session = onnxruntime.InferenceSession(vision_model_path)
embed_session = onnxruntime.InferenceSession(embed_model_path)
decoder_session = onnxruntime.InferenceSession(decoder_model_path)

## Set config values
num_key_value_heads = config.text_config.num_key_value_heads
head_dim = config.text_config.head_dim
num_hidden_layers = config.text_config.num_hidden_layers
eos_token_id = config.text_config.eos_token_id
image_token_id = config.image_token_id


# 2. Prepare inputs
## Create input messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Convert this page to docling."}
        ]
    },
]

## Load image and apply processor
image = load_image("https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/assets/new_arxiv.png")
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="np")

## Prepare decoder inputs
batch_size = inputs['input_ids'].shape[0]
past_key_values = {
    f'past_key_values.{layer}.{kv}': np.zeros([batch_size, num_key_value_heads, 0, head_dim], dtype=np.float32)
    for layer in range(num_hidden_layers)
    for kv in ('key', 'value')
}
image_features = None
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']


# 3. Generation loop
max_new_tokens = 4096
generated_tokens = np.array([[]], dtype=np.int64)
for i in range(max_new_tokens):
  inputs_embeds = embed_session.run(None, {'input_ids': input_ids})[0]

  if image_features is None:
    ## Only compute vision features if not already computed
    image_features = vision_session.run(None, dict(
        pixel_values=inputs['pixel_values'],
        pixel_attention_mask=inputs['pixel_attention_mask'].astype(np.bool_)
    ))[0]

    ## Merge text and vision embeddings
    inputs_embeds[inputs['input_ids'] == image_token_id] = image_features.reshape(-1, image_features.shape[-1])

  logits, *present_key_values = decoder_session.run(None, dict(
      inputs_embeds=inputs_embeds,
      attention_mask=attention_mask,
      **past_key_values,
  ))

  ## Update values for next generation loop
  input_ids = logits[:, -1].argmax(-1, keepdims=True)
  attention_mask = np.concatenate([attention_mask, np.ones((batch_size, 1), dtype=attention_mask.dtype)], axis=-1)
  for j, key in enumerate(past_key_values):
    past_key_values[key] = present_key_values[j]

  generated_tokens = np.concatenate([generated_tokens, input_ids], axis=-1)
  if (input_ids == eos_token_id).all():
    break

  ## (Optional) Streaming
  print(processor.decode(input_ids[0]), end='')
print()


# 4. Do something with the final output
print(processor.batch_decode(generated_tokens, skip_special_tokens=False)[0])

onnxruntime/core/providers/shared_library/provider_api.h

onnxruntime/core/session/onnxruntime_c_api.cc

onnxruntime/test/testdata/create_external_data_model.py

tianleiwu · 2026-02-18T02:15:26Z

Please do not merge. There is another simple fix can solve the issue. See #27374.

yuslepukhin added 4 commits February 13, 2026 18:27

introduce parsing for whitelisted paths

a06caea

Add SessionOptionsSetWhiteListedDataFolders public API

0853aa8

Properly detect symlinks

4f70b56

Test validate path

0f9147c

yuslepukhin requested review from adrianlizarraga, Copilot and skottmckay February 15, 2026 00:45

yuslepukhin added the release:1.24.2 label Feb 15, 2026

Copilot started reviewing on behalf of yuslepukhin February 15, 2026 00:46 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

tianleiwu reviewed Feb 15, 2026

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Show resolved Hide resolved

istupakov mentioned this pull request Feb 15, 2026

onnxruntime 1.24 broke loading models from the huggingface_hub cache #27353

Closed

yuslepukhin marked this pull request as ready for review February 15, 2026 18:27

address copilot feedback

04c8087

tianleiwu requested review from edgchen1, fs-eire and hariharans29 February 17, 2026 16:12

xenova mentioned this pull request Feb 17, 2026

[Bug]: Loading models with additional files fails with onnxruntime 1.24.1 qdrant/fastembed#603

Open

tianleiwu added 2 commits February 17, 2026 17:28

fix training build

8f1df62

fix training build retry

5d19a23

hariharans29 reviewed Feb 17, 2026

View reviewed changes

onnxruntime/core/providers/shared_library/provider_api.h Show resolved Hide resolved

hariharans29 reviewed Feb 17, 2026

View reviewed changes

onnxruntime/core/session/onnxruntime_c_api.cc Show resolved Hide resolved

add tests

e8bdeba

tianleiwu requested a review from hariharans29 February 17, 2026 22:38

github-advanced-security bot found potential problems Feb 17, 2026

View reviewed changes

onnxruntime/test/testdata/create_external_data_model.py Fixed Show fixed Hide fixed

hariharans29 previously approved these changes Feb 17, 2026

View reviewed changes

update python api

5402cf0

tianleiwu dismissed hariharans29’s stale review via 5402cf0 February 17, 2026 23:28

tianleiwu requested a review from hariharans29 February 17, 2026 23:35

hariharans29 approved these changes Feb 17, 2026

View reviewed changes

tianleiwu marked this pull request as draft February 18, 2026 02:14

tianleiwu removed the release:1.24.2 label Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce whitelisted folders for external data validation#27352

Introduce whitelisted folders for external data validation#27352
yuslepukhin wants to merge 9 commits intomainfrom
yuslepukhin/whitelist_data_folders

yuslepukhin commented Feb 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xenova commented Feb 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yuslepukhin commented Feb 15, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xenova commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xenova commented Feb 17, 2026 •

edited

Loading

tianleiwu commented Feb 18, 2026 •

edited

Loading