From 9c747ee26d28f4be5defc0663d9551ea11341284 Mon Sep 17 00:00:00 2001 From: Andrew Schilling Date: Thu, 10 Apr 2025 17:42:14 +0000 Subject: [PATCH 1/5] Correcting file names Signed-off-by: Andrew Schilling --- ...ding_new_models.md => adding-new-models.md} | 0 .../chat-datasets.md} | 0 .../checkpointing.md | 0 .../design-and-philosophy.md} | 0 .../{design_docs => design-docs}/generation.md | 0 docs/design-docs/gpu-logger.md | 0 docs/{design_docs => design-docs}/logger.md | 0 docs/{design_docs => design-docs}/padding.md | 0 docs/{design_docs => design-docs}/uv.md | 0 docs/guides/sft.md | 2 +- docs/index.md | 18 +++++++++--------- ...cal_workstation.md => local-workstation.md} | 0 12 files changed, 10 insertions(+), 10 deletions(-) rename docs/{adding_new_models.md => adding-new-models.md} (100%) rename docs/{design_docs/chat_datasets.md => design-docs/chat-datasets.md} (100%) rename docs/{design_docs => design-docs}/checkpointing.md (100%) rename docs/{design_docs/design_and_philosophy.md => design-docs/design-and-philosophy.md} (100%) rename docs/{design_docs => design-docs}/generation.md (100%) create mode 100644 docs/design-docs/gpu-logger.md rename docs/{design_docs => design-docs}/logger.md (100%) rename docs/{design_docs => design-docs}/padding.md (100%) rename docs/{design_docs => design-docs}/uv.md (100%) rename docs/{local_workstation.md => local-workstation.md} (100%) diff --git a/docs/adding_new_models.md b/docs/adding-new-models.md similarity index 100% rename from docs/adding_new_models.md rename to docs/adding-new-models.md diff --git a/docs/design_docs/chat_datasets.md b/docs/design-docs/chat-datasets.md similarity index 100% rename from docs/design_docs/chat_datasets.md rename to docs/design-docs/chat-datasets.md diff --git a/docs/design_docs/checkpointing.md b/docs/design-docs/checkpointing.md similarity index 100% rename from docs/design_docs/checkpointing.md rename to docs/design-docs/checkpointing.md diff --git a/docs/design_docs/design_and_philosophy.md b/docs/design-docs/design-and-philosophy.md similarity index 100% rename from docs/design_docs/design_and_philosophy.md rename to docs/design-docs/design-and-philosophy.md diff --git a/docs/design_docs/generation.md b/docs/design-docs/generation.md similarity index 100% rename from docs/design_docs/generation.md rename to docs/design-docs/generation.md diff --git a/docs/design-docs/gpu-logger.md b/docs/design-docs/gpu-logger.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/design_docs/logger.md b/docs/design-docs/logger.md similarity index 100% rename from docs/design_docs/logger.md rename to docs/design-docs/logger.md diff --git a/docs/design_docs/padding.md b/docs/design-docs/padding.md similarity index 100% rename from docs/design_docs/padding.md rename to docs/design-docs/padding.md diff --git a/docs/design_docs/uv.md b/docs/design-docs/uv.md similarity index 100% rename from docs/design_docs/uv.md rename to docs/design-docs/uv.md diff --git a/docs/guides/sft.md b/docs/guides/sft.md index 4d452b109d..8a67da85e8 100644 --- a/docs/guides/sft.md +++ b/docs/guides/sft.md @@ -29,7 +29,7 @@ SFT datasets in Reinforcer are encapsulated using classes. Each SFT data class i 1. `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below. 2. `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below. -SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design_docs/chat_datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example: +SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design-docs/chat-datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example: ```python def format_squad(data): diff --git a/docs/index.md b/docs/index.md index 553778ff98..0b802b0ce2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,7 +6,7 @@ :caption: 🖥️ Environment Start :hidden: -local_workstation.md +local-workstation.md cluster.md ``` @@ -15,7 +15,7 @@ cluster.md :caption: 📚 Guides :hidden: -adding_new_models.md +adding-new-models.md guides/sft.md guides/grpo.md guides/eval.md @@ -41,11 +41,11 @@ apidocs/index.rst :caption: 📐 Design Docs :hidden: -design_docs/design_and_philosophy.md -design_docs/padding.md -design_docs/logger.md -design_docs/uv.md -design_docs/chat_datasets.md -design_docs/generation.md -design_docs/checkpointing.md +design-docs/design-and-philosophy.md +design-docs/padding.md +design-docs/logger.md +design-docs/uv.md +design-docs/chat-datasets.md +design-docs/generation.md +design-docs/checkpointing.md ``` diff --git a/docs/local_workstation.md b/docs/local-workstation.md similarity index 100% rename from docs/local_workstation.md rename to docs/local-workstation.md From 19e3d85c0452977d7c633f59ba8ad126dc66a312 Mon Sep 17 00:00:00 2001 From: Andrew Schilling Date: Thu, 10 Apr 2025 17:59:27 +0000 Subject: [PATCH 2/5] Applying SEO Fixes Signed-off-by: Andrew Schilling --- docs/design-docs/index.md | 12 ++++++++++++ docs/guides/index.md | 9 +++++++++ 2 files changed, 21 insertions(+) create mode 100644 docs/design-docs/index.md create mode 100644 docs/guides/index.md diff --git a/docs/design-docs/index.md b/docs/design-docs/index.md new file mode 100644 index 0000000000..e178a61002 --- /dev/null +++ b/docs/design-docs/index.md @@ -0,0 +1,12 @@ +```{toctree} +:caption: 📐 Design Docs +:hidden: + +design-and-philosophy.md +padding.md +logger.md +uv.md +chat-datasets.md +generation.md +checkpointing.md +``` \ No newline at end of file diff --git a/docs/guides/index.md b/docs/guides/index.md new file mode 100644 index 0000000000..4276cc8d22 --- /dev/null +++ b/docs/guides/index.md @@ -0,0 +1,9 @@ +```{toctree} +:caption: 📚 Guides +:hidden: + +adding-new-models.md +sft.md +grpo.md +eval.md +``` \ No newline at end of file From f25343778b5d8cffa58cd82c61e0538a40b0939f Mon Sep 17 00:00:00 2001 From: Andrew Schilling Date: Thu, 10 Apr 2025 18:46:11 +0000 Subject: [PATCH 3/5] Trying to sign commit Signed-off-by: Andrew Schilling --- docs/adding-new-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adding-new-models.md b/docs/adding-new-models.md index c39642ea69..a682b4c3a7 100644 --- a/docs/adding-new-models.md +++ b/docs/adding-new-models.md @@ -20,7 +20,7 @@ $$ \frac{1}{n}\sum_{i=1}^{n\text{(tokens)}}\exp\left(\left\|\text{logprobs-train-fwk}_i - \text{logprobs-sampling-fwk}_i\right\|\right) $$ -where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ +Where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ as a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient. From 3a878c516dbd80574d417ec181cef35307c3c9e0 Mon Sep 17 00:00:00 2001 From: Andrew Schilling Date: Thu, 10 Apr 2025 19:10:28 +0000 Subject: [PATCH 4/5] Fixing Typo Signed-off-by: Andrew Schilling --- docs/adding-new-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adding-new-models.md b/docs/adding-new-models.md index a682b4c3a7..41aac4d07b 100644 --- a/docs/adding-new-models.md +++ b/docs/adding-new-models.md @@ -22,7 +22,7 @@ $$ Where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ -as a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient. +As a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient. ## Understanding Discrepancies Between Backends From 9d89cb5228020a55655e149920502a2b5a7b49db Mon Sep 17 00:00:00 2001 From: Andrew Schilling Date: Tue, 15 Apr 2025 17:02:24 +0000 Subject: [PATCH 5/5] Reverting text edits Signed-off-by: Andrew Schilling --- docs/adding-new-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adding-new-models.md b/docs/adding-new-models.md index 41aac4d07b..34aaaaf3b0 100644 --- a/docs/adding-new-models.md +++ b/docs/adding-new-models.md @@ -20,7 +20,7 @@ $$ \frac{1}{n}\sum_{i=1}^{n\text{(tokens)}}\exp\left(\left\|\text{logprobs-train-fwk}_i - \text{logprobs-sampling-fwk}_i\right\|\right) $$ -Where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ +where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ As a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient.