From b0ce4ae3e7f00cd4e8bb000cbcf4a8b6c853285e Mon Sep 17 00:00:00 2001
From: Insop <1240382+insop@users.noreply.github.com>
Date: Sun, 8 Jun 2025 16:02:41 -0700
Subject: [PATCH] docs: update tag reference

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index d4b1b80..fe3dc34 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ We design an RL training pipeline to train a base model for generating [Triton K
 
 We design the reward function with two components:
 
-1. ✅ Format Checking: Validate correct usage of `<thinking>` and `<answer>` tags.
+1. ✅ Format Checking: Validate correct usage of `<think>` and `<answer>` tags.
 2.	🔍 Similarity Score: Measure string similarity between generated and ground-truth Triton kernels using Python’s `difflib.SequenceMatcher`. This idea is inspired by [`SWE-RL`](https://arxiv.org/abs/2502.18449).
 
 ### 🧪 Evaluation