`--use-te-activation-func` Flag Ignored for Non-MoE GPT Models

**Describe the bug**

The `--use-te-activation-func` command-line flag is correctly parsed, but it is not propagated to the layer spec builder for non-MoE GPT models. As a result, the flag is silently ignored and Transformer Engine’s activation function is never enabled.

**Steps/Code to reproduce bug**

1. Launch training on a non-MoE GPT model

2. Enable Transformer Engine and set --use-te-activation-func

3. Inspect logs or execution path of the MLP activation

4. Observe that PyTorch GELU is used instead of TE activation

**Example Command**

```bash
#!/bin/bash

export CUDA_DEVICE_MAX_CONNECTIONS=1
export MASTER_ADDR=localhost
export MASTER_PORT=6105

torchrun --nnodes=1 --nproc-per-node=1 pretrain_gpt.py \
    --tensor-model-parallel-size 1 \
    --pipeline-model-parallel-size 1 \
    --expert-model-parallel-size 1 \
    --train-samples 200 \
    --tokenizer-type GPT2BPETokenizer \
    --split 1000,0,0 \
    --eval-iters 0 \
    --use-cpu-initialization \
    --num-layers 12 \
    --hidden-size 256 \
    --num-attention-heads 4 \
    --max-position-embeddings 256 \
    --seq-length 256 \
    --micro-batch-size 2 \
    --global-batch-size 2 \
    --lr 0.0001 \
    --distributed-backend nccl \
    --seed 42 \
    --no-bias-gelu-fusion \
    --use-te-activation-func \
    --data-path <your-data-path> \
    --vocab-file <your-vocab-file-path> \
    --merge-file <your-merge-file-path>
```

**Expected behavior**

When `--use-te-activation-func` is enabled, the model should use Transformer Engine’s activation function. When the flag is removed, the model should fall back to PyTorch’s GELU.

Therefore, running the example command with and without `--use-te-activation-func` is expected to produce small numerical differences. However, the two runs produce identical results.

No warning or error is emitted.

**Additional context**

A possible cause is that in `gpt_builders.py`, the `_get_transformer_layer_spec()` function calls
 `get_gpt_layer_with_transformer_engine_spec()` without forwarding the `use_te_activation_func` argument, causing it to default to `False`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--use-te-activation-func` Flag Ignored for Non-MoE GPT Models #2770

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

--use-te-activation-func Flag Ignored for Non-MoE GPT Models #2770

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`--use-te-activation-func` Flag Ignored for Non-MoE GPT Models #2770