huggingface · kashif · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -1407,6 +1407,8 @@
     - sections:
       - local: model_doc/autoformer
         title: Autoformer
+      - local: model_doc/ctsm
+        title: CTSM
       - local: model_doc/informer
         title: Informer
       - local: model_doc/patchtsmixer

diff --git a/docs/source/en/model_doc/ctsm.md b/docs/source/en/model_doc/ctsm.md
@@ -0,0 +1,122 @@
+<!--Copyright 2026 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+*This model was released on 2025-11-25 and added to Hugging Face Transformers on 2026-04-17.*
+
+<div style="float: right;">
+    <div class="flex flex-wrap space-x-1">
+        <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
+        <img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
+        <img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
+    </div>
+</div>
+
+# CTSM
+
+## Overview
+
+The Cisco Time Series Model (CTSM) was proposed in [Cisco Time Series Model Technical Report](https://huggingface.co/papers/2511.19841) by Liang Gou, Archit Khare, Praneet Pabolu, Prachi Patel, Joseph Ross, Hercy Shen, Yuhan (Ellen) Song, Jingze Sun, Kristal Curtis, Vedant Dharnidharka, Abhinav Mathur and Hao Yang.
+
+CTSM is a decoder-only univariate zero-shot forecasting foundation model. Its central idea is a **multi-resolution context**: instead of consuming a single-scale history, each forecast conditions on two aligned streams — a coarse low-frequency stream (e.g. 512 hourly points) and a fine high-frequency stream (e.g. 512 minutely points), with the resolution ratio fixed to 60. A learnable **special token** separates the two streams and learned **resolution embeddings** are added to the token stream to distinguish them. The coarse stream lets the model see week-over-week structure without giving up fine-grained recent detail; as the paper puts it, "more complex multiresolution architectures would require a context length of 30,720 (30 times as long as ours) to cover the same time range."
+
+The abstract from the paper is the following:
+
+*We introduce the Cisco Time Series Model, a univariate zero-shot forecaster. This time series foundation model is the result of a general architectural innovation to a time series model enabling it to accept multiresolution input, applied to a popular decoder-only time series model (TimesFM). The resulting multiresolution decoder-only model is trained on over 300B unique data points, with more than half coming from the observability domain. Quantitative and qualitative evaluations demonstrate that the resulting model achieves superior performance on observability datasets while retaining very similar performance on a standard general-purpose forecasting benchmark (GIFT-Eval), and suggest that the multiresolution structure enables the model to make more accurate predictions on long context input.*
+
+### Architecture
+
+The backbone follows TimesFM 2.0: patching (patch length 32) + a residual-block input tokenizer + decoder-only transformer layers with per-dimension learnable query scaling + a residual-block horizon head. CTSM adds, on top:
+
+- A **special token** inserted between the coarse and fine patch streams, so the input is `[coarse₁, …, coarse₁₆, SPECIAL, fine₁, …, fine₁₆]`.
+- **Resolution embeddings** (3-way: coarse / special / fine) added to each token before the transformer stack.
+- **Stream-level normalization**: each stream is standardized independently over its non-padded context, and the fine-stream statistics are used to rescale the forecast.
+- A **frequency embedding** inherited from TimesFM, added to every token.
+
+The 250M **CTSM 1.0** release checkpoint additionally introduces (over the 500M `1.0-preview` described in the paper):
+
+- **Rotary position embeddings (RoPE)** applied to query/key inside attention.
+- **Bidirectional attention over the coarse block** — tokens in the coarse segment attend both ways within that segment, while the fine segment remains causal.
+- **15-quantile prediction** (levels 0.01–0.99) instead of 9.
+- **Short-context training** (1/3 of training samples drawn with `|fine| ∈ [10, 511]`) for better robustness when less history is available.
+- Trained from scratch (not continued pre-training from TimesFM 2.0) on ~2× more internal observability data.
+
+### Inference
+
+For `horizon_len > config.horizon_length`, [`CtsmModelForPrediction`] runs an autoregressive multi-resolution decode loop, using a [`DynamicCache`] by default (opt out with `use_cache=False`). Each step feeds only the newly-appended fine patches through the stack and attends to cached K/V for every earlier position. Stream-normalization statistics are frozen to their step-1 values so that cached K/V remains valid; the coarse block is pinned and the cache is rebuilt if the concatenated sequence would outgrow `max_position_embeddings`.
+
+The checkpoint can be found at [`cisco-ai/cisco-time-series-model-1.0`](https://huggingface.co/cisco-ai/cisco-time-series-model-1.0). The original inference code is at [github.com/splunk/cisco-time-series-model](https://github.com/splunk/cisco-time-series-model).
+
+This model was contributed by [kashif](https://huggingface.co/kashif).
+
+## Usage
+
+Pass a list of fine-resolution time series (e.g. minute-level); the coarse stream is built automatically by mean-aggregating consecutive blocks of `config.agg_factor` points.
+
+```python
+import numpy as np
+import torch
+from transformers import CtsmModelForPrediction
+
+
+model = CtsmModelForPrediction.from_pretrained("cisco-ai/cisco-time-series-model-1.0", device_map="auto")
+
+# ~8.5 hours of 1-minute data; the model will build a 512-hour coarse context by aggregation.
+series = np.sin(np.linspace(0, 200, 512 * 60)).astype(np.float32)
+past_values = [torch.tensor(series, device=model.device)]
+
+with torch.no_grad():
+    outputs = model(past_values=past_values, horizon_len=128)
+
+point_forecast = outputs.mean_predictions       # (batch, horizon_len)
+quantile_forecast = outputs.full_predictions    # (batch, horizon_len, 1 + num_quantiles)
+```
+
+If you already have a coarse stream (e.g. pre-computed 1-hour roll-ups that go further back than you have 1-minute data for), pass `(coarse, fine)` pairs directly:
+
+```python
+coarse = torch.tensor(hourly_series, dtype=torch.float32)    # up to 512 points
+fine = torch.tensor(minutely_series, dtype=torch.float32)    # up to 512 points
+outputs = model(past_values=[(coarse, fine)], horizon_len=128)
+```
+
+For `horizon_len > 128`, the model decodes autoregressively and extends the output accordingly.
+
+## CtsmConfig
+
+[[autodoc]] CtsmConfig
+
+## CtsmModel
+
+[[autodoc]] CtsmModel
+    - forward
+
+## CtsmModelForPrediction
+
+[[autodoc]] CtsmModelForPrediction
+    - forward
+
+## Citation
+
+```bibtex
+@misc{gou2025ciscotimeseriesmodel,
+      title={Cisco Time Series Model Technical Report},
+      author={Liang Gou and Archit Khare and Praneet Pabolu and Prachi Patel and Joseph Ross and Hercy Shen and Yuhan Song and Jingze Sun and Kristal Curtis and Vedant Dharnidharka and Abhinav Mathur and Hao Yang},
+      year={2025},
+      eprint={2511.19841},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2511.19841}
+}
+```
diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py
@@ -81,6 +81,7 @@
     from .cpmant import *
     from .csm import *
     from .ctrl import *
+    from .ctsm import *
     from .cvt import *
     from .cwm import *
     from .d_fine import *

diff --git a/src/transformers/models/auto/auto_mappings.py b/src/transformers/models/auto/auto_mappings.py
@@ -108,6 +108,7 @@
         ("csm", "CsmConfig"),
         ("csm_depth_decoder_model", "CsmDepthDecoderConfig"),
         ("ctrl", "CTRLConfig"),
+        ("ctsm", "CtsmConfig"),
         ("cvt", "CvtConfig"),
         ("cwm", "CwmConfig"),
         ("d_fine", "DFineConfig"),

diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py
@@ -99,6 +99,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
         ("cpmant", "CpmAntModel"),
         ("csm", "CsmForConditionalGeneration"),
         ("ctrl", "CTRLModel"),
+        ("ctsm", "CtsmModel"),
         ("cvt", "CvtModel"),
         ("cwm", "CwmModel"),
         ("d_fine", "DFineModel"),
@@ -1811,6 +1812,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
 
 MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING_NAMES = OrderedDict(
     [
+        ("ctsm", "CtsmModelForPrediction"),
         ("timesfm", "TimesFmModelForPrediction"),
         ("timesfm2_5", "TimesFm2_5ModelForPrediction"),
     ]

diff --git a/src/transformers/models/ctsm/__init__.py b/src/transformers/models/ctsm/__init__.py
@@ -0,0 +1,28 @@
+# Copyright 2026 the HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import TYPE_CHECKING
+
+from ...utils import _LazyModule
+from ...utils.import_utils import define_import_structure
+
+
+if TYPE_CHECKING:
+    from .configuration_ctsm import *
+    from .modeling_ctsm import *
+else:
+    import sys
+
+    _file = globals()["__file__"]
+    sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
diff --git a/src/transformers/models/ctsm/configuration_ctsm.py b/src/transformers/models/ctsm/configuration_ctsm.py
@@ -0,0 +1,118 @@
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/ctsm/modular_ctsm.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_ctsm.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+# Copyright 2026 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from huggingface_hub.dataclasses import strict
+
+from ...configuration_utils import PreTrainedConfig
+from ...modeling_rope_utils import RopeParameters
+from ...utils import auto_docstring
+
+
+@auto_docstring(checkpoint="cisco-ai/cisco-time-series-model-1.0")
+@strict
+class CtsmConfig(PreTrainedConfig):
+    r"""
+    patch_length (`int`, *optional*, defaults to 32):
+        Length of one patch in the input sequence for each resolution stream.
+    context_length (`int`, *optional*, defaults to 512):
+        Length of the input context for each resolution stream.
+    horizon_length (`int`, *optional*, defaults to 128):
+        Length of the prediction horizon produced per autoregressive step.
+    freq_size (`int`, *optional*, defaults to 3):
+        Number of frequency embeddings.
+    tolerance (`float`, *optional*, defaults to 1e-06):
+        Numerical tolerance used in normalization.
+    pad_val (`float`, *optional*, defaults to 1123581321.0):
+        Sentinel value marking padded positions in the input series.
+    num_hidden_layers (`int`, *optional*, defaults to 25):
+        Number of decoder layers.
+    quantiles (`list[float]`, *optional*, defaults to 15 values between 0.01 and 0.99):
+        Quantile levels predicted by the model.
+    use_positional_embedding (`bool`, *optional*, defaults to `False`):
+        CTSM uses rotary position embeddings and does not add sinusoidal positional embeddings.
+    use_resolution_embeddings (`bool`, *optional*, defaults to `True`):
+        Whether to add a learned embedding per resolution bucket (coarse / special / fine).
+    use_special_token (`bool`, *optional*, defaults to `True`):
+        Whether to insert a learned special token between the coarse and fine streams.
+    num_resolutions (`int`, *optional*, defaults to 3):
+        Number of resolution embeddings (coarse, special token, fine).
+    agg_factor (`int`, *optional*, defaults to 60):
+        Aggregation factor between fine and coarse resolutions (e.g. 60 minutes -> 1 hour).
+    max_position_embeddings (`int`, *optional*, defaults to 1025):
+        Maximum number of patches in the concatenated sequence (coarse + special + fine).
+    rope_parameters (`dict`, *optional*):
+        Rotary position embedding parameters. Defaults to `{"rope_type": "default", "rope_theta": 10000.0}`.
+
+    Example:
+
+    ```python
+    >>> from transformers import CtsmConfig, CtsmModelForPrediction
+
+    >>> configuration = CtsmConfig()
+    >>> model = CtsmModelForPrediction(configuration)
+    >>> configuration = model.config
+    ```
+    """
+
+    model_type = "ctsm"
+    keys_to_ignore_at_inference = []
+    is_encoder_decoder = False
+
+    patch_length: int = 32
+    context_length: int = 512
+    horizon_length: int = 128
+    freq_size: int = 3
+
+    num_hidden_layers: int = 25
+    hidden_size: int = 1280
+    intermediate_size: int = 1280
+    head_dim: int = 80
+    num_attention_heads: int = 16
+    tolerance: float = 1e-6
+    rms_norm_eps: float = 1e-6
+    quantiles: list[float] | tuple[float, ...] = (
+        0.01,
+        0.05,
+        0.1,
+        0.2,
+        0.25,
+        0.3,
+        0.4,
+        0.5,
+        0.6,
+        0.7,
+        0.75,
+        0.8,
+        0.9,
+        0.95,
+        0.99,
+    )
+    pad_val: float = 1123581321.0
+    attention_dropout: float | int = 0.0
+    use_positional_embedding: bool = False
+    initializer_range: float = 0.02
+    use_resolution_embeddings: bool = True
+    use_special_token: bool = True
+    num_resolutions: int = 3
+    agg_factor: int = 60
+    max_position_embeddings: int = 1025
+    rope_parameters: RopeParameters | dict | None = None
+
+
+__all__ = ["CtsmConfig"]