Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ sh INSTALL_MEGATRON.sh
| Server startup scripts | transformers/megatron | [Script](cookbook/client/server) |

## Changelog
- 🎉2026-04-22 The ModelScope service has been deployed to [Qwen/Qwen3.6-27B](https://www.modelscope.cn/models/Qwen/Qwen3.6-27B) with a new release 0.2.1.
- 🎉2026-04-14 The ModelScope service has been deployed to [Qwen/Qwen3.6-35B-A3B](https://www.modelscope.cn/models/Qwen/Qwen3.6-35B-A3B) with a new release 0.2.0.
- 🎉2026-03-28 Support DPO training with both Transformers and Megatron backends. See [dpo_full.py](cookbook/rl/dpo_full.py) and [dpo_lora.py](cookbook/rl/dpo_lora.py).
- 🎉2026-03-24 Twinkle Web site is now live at https://modelscope.github.io/twinkle-web/
Expand Down Expand Up @@ -143,7 +144,7 @@ supported on Twinkle✨ framework.
> For serverless training service accessed via `base_url=https://www.modelscope.cn/twinkle`, it
> is currently provided via the Tinker-compatible APIs. We will be rolling out services that support
> both Tinker APIs, as well as the full-fledged Twinkle✨ native APIs. The serverless endpoint is backed
> by one training base at a time, and currently it is [Qwen3.6-35B-A3B](https://modelscope.cn/models/Qwen/Qwen3.6-35B-A3B).
> by one training base at a time, and currently it is [Qwen3.6-27B](https://modelscope.cn/models/Qwen/Qwen3.6-27B).

| Model Type | Model ID on [ModelScope](https://modelscope.cn) | Model Size | Requires | Support Megatron | HF Model ID |
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
Expand Down Expand Up @@ -192,7 +193,7 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me

def train():
# to load model from Hugging Face, use 'hf://...'
base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
Expand Down Expand Up @@ -248,7 +249,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
base_url='your-base-url'
api_key='your-api-key'

Expand Down
7 changes: 4 additions & 3 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ sh INSTALL_MEGATRON.sh
Twinkle✨支持相同的算法接口运行在单GPU、torchrun多机、Ray、Client等各场景下。其算法过程是外露的,非常便于修改和调试。完整的框架介绍请查看[快速开始](docs/source_zh/使用指引/快速开始.md)

## 更新日志
🎉2026-04-22 ModelScope的训练服务部署为[Qwen/Qwen3.6-27B](https://www.modelscope.cn/models/Qwen/Qwen3.6-27B),并发布了0.2.1版本.
🎉2026-04-16 ModelScope的训练服务部署为[Qwen/Qwen3.6-35B-A3B](https://www.modelscope.cn/models/Qwen/Qwen3.6-35B-A3B),并发布了0.2.0版本.
🎉2026-03-28 支持 DPO 训练,同时支持 Transformers 和 Megatron 后端。参考 [dpo_full.py](cookbook/rl/dpo_full.py) 和 [dpo_lora.py](cookbook/rl/dpo_lora.py)。
🎉2026-03-24 Twinkle 站点上线,访问地址 https://modelscope.github.io/twinkle-web/
Expand All @@ -129,7 +130,7 @@ Twinkle✨支持相同的算法接口运行在单GPU、torchrun多机、Ray、Cl
随着新模型的发布,我们将添加对更多模型的支持。下表列出了 Twinkle✨ 框架当前支持的模型。

>[!Note]
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3.6-35B-A3B](https://modelscope.cn/models/Qwen/Qwen3.6-35B-A3B)。
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3.6-27B](https://modelscope.cn/models/Qwen/Qwen3.6-27B)。

| Model Type | Model ID 举例 | Model Size | Requires | Support Megatron | HF Model ID |
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
Expand Down Expand Up @@ -177,7 +178,7 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me

def train():
# to load model from Hugging Face, use 'hf://...'
base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
Expand Down Expand Up @@ -233,7 +234,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
base_url='your-base-url'
api_key='your-api-key'

Expand Down
26 changes: 12 additions & 14 deletions cookbook/client/server/megatron/server_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,15 @@ applications:
# Used for generating text from the model (e.g., evaluating LoRA results).
# Config: TP=2 x DP=2 on 4 GPUs, ~27GB weights/GPU, ~37GB for KV cache + LoRA
- name: sampler-Qwen3.6-35B-A3B
route_prefix: /api/v1/sampler/Qwen/Qwen3.6-35B-A3B
route_prefix: /api/v1/sampler/Qwen/Qwen3.6-27B
import_path: sampler
args:
model_id: "ms://Qwen/Qwen3.6-35B-A3B" # ModelScope model identifier
model_id: "ms://Qwen/Qwen3.6-27B" # ModelScope model identifier
nproc_per_node: 4 # Number of GPU processes per node
sampler_type: vllm # Inference engine: 'vllm' (fast) or 'torch' (TorchSampler)
engine_args: # vLLM engine-specific settings
max_model_len: 32000 # Maximum sequence length the engine supports
gpu_memory_utilization: 0.80 # 80% utilization, ~64GB/GPU, leaves buffer for safety
max_model_len: 65536 # Maximum sequence length the engine supports
gpu_memory_utilization: 0.75 # 80% utilization, ~64GB/GPU, leaves buffer for safety
enable_lora: true # Allow loading LoRA adapters during inference
max_loras: 5 # Max allowed loras working on vLLM at the same time
max_lora_rank: 32 # Support up to rank 64 LoRA adapters
Expand All @@ -63,8 +63,8 @@ applications:
tp_size: 2 # 2 TP replicas for multi-tenant throughput
queue_config:
rps_limit: 20 # Max requests per second
tps_limit: 32000 # Max tokens per second
max_input_tokens: 32000
tps_limit: 131072 # Max tokens per second
max_input_tokens: 65536
deployments:
- name: SamplerManagement
autoscaling_config:
Expand All @@ -81,12 +81,12 @@ applications:
# 2. Model Service - Hosts the base model for training.
# Config: PP=2 x DP=2 on 4 GPUs, ~27GB weights/GPU, comfortable for LoRA training
- name: models-Qwen3.6-35B-A3B
route_prefix: /api/v1/model/Qwen/Qwen3.6-35B-A3B
route_prefix: /api/v1/model/Qwen/Qwen3.6-27B
import_path: model
args:
use_megatron: true # Use Megatron-LM backend
model_id: "ms://Qwen/Qwen3.6-35B-A3B" # ModelScope model identifier
max_length: 32000 # model max length
model_id: "ms://Qwen/Qwen3.6-27B" # ModelScope model identifier
max_length: 65536 # model max length
max_loras: 3 # model max loras
nproc_per_node: 4 # Number of GPU processes per node
device_group:
Expand All @@ -95,15 +95,13 @@ applications:
device_type: cuda
device_mesh:
device_type: cuda
tp_size: 2
ep_size: 2
dp_size: 2
pp_size: 2
sequence_parallel: True

queue_config:
rps_limit: 20 # Max requests per second
tps_limit: 32000 # Max tokens per second
max_input_tokens: 32000
tps_limit: 131072 # Max tokens per second
max_input_tokens: 65536
adapter_config:
adapter_timeout: 120 # Seconds before idle adapter unload
adapter_max_lifetime: 36000 # Maximum lifetime of an adapter in seconds (e.g., 10 hours)
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'
api_key = os.environ.get('MODELSCOPE_TOKEN')
dataset_id = 'ms://hjh0119/shareAI-Llama3-DPO-zh-en-emoji'
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from tinker import ServiceClient

base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Define the base model and connect to the server
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/self_cognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from tinker import ServiceClient

# The base model to fine-tune / evaluate
base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'


Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/tinker/modelscope/short_math_grpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
logger = get_logger()

# ========== Configuration ==========
BASE_MODEL = 'Qwen/Qwen3.6-35B-A3B'
BASE_MODEL = 'Qwen/Qwen3.6-27B'
NUM_GENERATIONS = 4
MAX_NEW_TOKENS = 4096
LEARNING_RATE = 2e-5
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/twinkle/modelscope/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
logger = get_logger()

# Configuration (direct values, not from env)
base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'
dataset_id = 'ms://hjh0119/shareAI-Llama3-DPO-zh-en-emoji'

Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/twinkle/modelscope/multi_modal.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

logger = get_logger()

base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Initialize the Twinkle client to communicate with the remote server.
Expand Down
2 changes: 1 addition & 1 deletion cookbook/client/twinkle/modelscope/self_congnition.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

logger = get_logger()

base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'http://www.modelscope.cn/twinkle'

# Step 2: Initialize the Twinkle client to communicate with the remote server.
Expand Down
8 changes: 4 additions & 4 deletions docs/source_en/Usage Guide/Train-as-a-Service.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Alongside the open-source release of the Twinkle framework, we also provide a hosted model training service (Training as a Service) powered by ModelScope's backend infrastructure. Developers can use this service to experience Twinkle's training API for free.

The model currently running on the cluster is [Qwen/Qwen3.6-35B-A3B](https://www.modelscope.cn/models/Qwen/Qwen3.6-35B-A3B). Below are the detailed usage instructions:
The model currently running on the cluster is [Qwen/Qwen3.6-27B](https://www.modelscope.cn/models/Qwen/Qwen3.6-27B). Below are the detailed usage instructions:

## Step 1. Register a ModelScope Account and Obtain Your API Key

Expand Down Expand Up @@ -30,7 +30,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
base_url='https://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')

Expand Down Expand Up @@ -64,7 +64,7 @@ for epoch in range(2):
print(f'Saved checkpoint for epoch {epoch} to {result.path}')
```

With the code above, you can train a self-cognition LoRA based on `Qwen/Qwen3.6-35B-A3B`. This LoRA will change the model's name and creator to the names specified during training. To perform inference using this LoRA:
With the code above, you can train a self-cognition LoRA based on `Qwen/Qwen3.6-27B`. This LoRA will change the model's name and creator to the names specified during training. To perform inference using this LoRA:

```python
import os
Expand All @@ -79,7 +79,7 @@ init_tinker_client()

from tinker import ServiceClient

base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'https://www.modelscope.cn/twinkle'

# Step 2: Define the base model and connect to the server
Expand Down
8 changes: 4 additions & 4 deletions docs/source_zh/使用指引/训练服务.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
在 Twinkle 框架开源的同时,我们依托ModelScope的后台服务,也提供了托管的模型训练服务(Training as a Service),开发者可以通过这一服务,
免费体验Twinkle的训练API。

目前在集群中运行的模型是[Qwen/Qwen3.6-35B-A3B](https://www.modelscope.cn/models/Qwen/Qwen3.6-35B-A3B)。下面介绍具体的使用方法:
目前在集群中运行的模型是[Qwen/Qwen3.6-27B](https://www.modelscope.cn/models/Qwen/Qwen3.6-27B)。下面介绍具体的使用方法:

## Step 1. 注册ModelScope用户并获取 API Key

Expand Down Expand Up @@ -31,7 +31,7 @@ from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3.6-35B-A3B'
base_model = 'ms://Qwen/Qwen3.6-27B'
base_url='https://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')

Expand Down Expand Up @@ -65,7 +65,7 @@ for epoch in range(2):
print(f'Saved checkpoint for epoch {epoch} to {result.path}')
```

通过上述代码,你可以训练一个原模型为`Qwen/Qwen3.6-35B-A3B`的自我认知lora。这个lora会改变模型的名称和制造者为训练时指定的名称。使用这个lora进行推理:
通过上述代码,你可以训练一个原模型为`Qwen/Qwen3.6-27B`的自我认知lora。这个lora会改变模型的名称和制造者为训练时指定的名称。使用这个lora进行推理:

```python
import os
Expand All @@ -80,7 +80,7 @@ init_tinker_client()

from tinker import ServiceClient

base_model = 'Qwen/Qwen3.6-35B-A3B'
base_model = 'Qwen/Qwen3.6-27B'
base_url = 'https://www.modelscope.cn/twinkle'

# Step 2: Define the base model and connect to the server
Expand Down
Loading
Loading