Skip to content

[NVIDIA] Add TRT 70B (FP8 and FP4)#2

Closed
kedarpotdar-nv wants to merge 42 commits intomainfrom
fp4-init
Closed

[NVIDIA] Add TRT 70B (FP8 and FP4)#2
kedarpotdar-nv wants to merge 42 commits intomainfrom
fp4-init

Conversation

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

following up from #1.

Made these changes:

  1. added 70B TRT LLM config to 70b-tmpl.yml
  2. added new var for precision - fp8 or fp4
  3. ensured collect and plot scripts are reflective of trt and vllm

See workflow run # 134 which had sanity test.

Thanks!

@kedarpotdar-nv kedarpotdar-nv deleted the fp4-init branch September 17, 2025 18:07
jthomson04 pushed a commit to jthomson04/InferenceMAX that referenced this pull request Jan 21, 2026
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title Add TRT 70B (FP8 and FP4) [NVIDIA] Add TRT 70B (FP8 and FP4) Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants