duoan · ThierryHJ · Mar 15, 2026
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ Practice implementing operators and architectures from scratch — the exact ski
 [![GitHub stars](https://img.shields.io/github/stars/duoan/TorchCode?style=social)](https://github.com/duoan/TorchCode)
 [![GitHub Container Registry](https://img.shields.io/badge/ghcr.io-TorchCode-blue?style=flat-square&logo=github)](https://ghcr.io/duoan/torchcode)
 [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-TorchCode-blue?style=flat-square)](https://huggingface.co/spaces/duoan/TorchCode)
-![Problems](https://img.shields.io/badge/problems-40-orange?style=flat-square)
+![Problems](https://img.shields.io/badge/problems-41-orange?style=flat-square)
 ![GPU](https://img.shields.io/badge/GPU-not%20required-brightgreen?style=flat-square)
 
 [![Star History Chart](https://api.star-history.com/svg?repos=duoan/TorchCode&type=Date)](https://star-history.com/#duoan/TorchCode&Date)
@@ -44,7 +44,7 @@ TorchCode gives you a **structured practice environment** with:
 
 | | Feature | |
 |---|---|---|
-| 🧩 | **40 curated problems** | The most frequently asked PyTorch interview topics |
+| 🧩 | **41 curated problems** | The most frequently asked PyTorch interview topics |
 | ⚖️ | **Automated judge** | Correctness checks, gradient verification, and timing |
 | 🎨 | **Instant feedback** | Colored pass/fail per test case, just like competitive programming |
 | 💡 | **Hints when stuck** | Nudges without full spoilers |
@@ -114,6 +114,7 @@ The bread and butter of ML coding interviews. You'll be asked to write these wit
 | 17 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/17_dropout.ipynb" target="_blank">Dropout</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/17_dropout.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `MyDropout` (nn.Module) | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 🔥 | Train/eval mode, inverted scaling |
 | 18 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/18_embedding.ipynb" target="_blank">Embedding</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/18_embedding.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `MyEmbedding` (nn.Module) | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 🔥 | Lookup table, `weight[indices]` |
 | 19 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/19_gelu.ipynb" target="_blank">GELU</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/19_gelu.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `my_gelu(x)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Gaussian error linear unit, `torch.erf` |
+| 41 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/41_tanh.ipynb" target="_blank">Tanh</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_tanh.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `my_tanh(x)`, `tanh_backward(...)`, `soft_cap_logits(...)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Activation functions, backprop, logit soft-capping |
 | 20 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/20_weight_init.ipynb" target="_blank">Kaiming Init</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/20_weight_init.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `kaiming_init(weight)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | `std = sqrt(2/fan_in)`, variance scaling |
 | 21 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/21_gradient_clipping.ipynb" target="_blank">Gradient Clipping</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/21_gradient_clipping.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `clip_grad_norm(params, max_norm)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Norm-based clipping, direction preservation |
 | 31 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/31_gradient_accumulation.ipynb" target="_blank">Gradient Accumulation</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/31_gradient_accumulation.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `accumulated_step(model, opt, ...)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 💡 | Micro-batching, loss scaling |

diff --git a/solutions/41_tanh_solution.ipynb b/solutions/41_tanh_solution.ipynb
@@ -0,0 +1,83 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/41_tanh_solution.ipynb)\n\n",
+    "# Solution: Tanh, Backward & Soft-Capping\n",
+    "\n",
+    "$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
+    "\n",
+    "$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
+    "\n",
+    "$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
+    "try:\n",
+    "    import google.colab\n",
+    "    get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
+    "except ImportError:\n",
+    "    pass"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# ✅ SOLUTION\n\ndef my_tanh(x: torch.Tensor) -> torch.Tensor:\n    # Equivalent to (e^x - e^-x)/(e^x + e^-x), but numerically stable\n    # Divide numerator & denominator by e^x  →  2·sigmoid(2x) - 1\n    return 2.0 / (1.0 + torch.exp(-2.0 * x)) - 1.0\n\n\ndef tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n    return grad_output * (1 - tanh_output ** 2)\n\n\ndef soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n    return cap * my_tanh(logits / cap)"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Verify\n",
+    "x = torch.tensor([-2., -1., 0., 1., 2.])\n",
+    "print('my_tanh:', my_tanh(x))\n",
+    "print('ref:    ', torch.tanh(x))\n",
+    "\n",
+    "t = my_tanh(x)\n",
+    "print('backward:', tanh_backward(torch.ones_like(t), t))\n",
+    "\n",
+    "logits = torch.tensor([-50., 0., 50.])\n",
+    "print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))\n",
+    "\n",
+    "# Run judge\n",
+    "from torch_judge import check\n",
+    "check('tanh')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/templates/41_tanh.ipynb b/templates/41_tanh.ipynb
@@ -0,0 +1,144 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_tanh.ipynb)\n\n",
+    "# 🟢 Easy: Tanh, Backward & Soft-Capping\n",
+    "\n",
+    "Implement the **tanh** activation function, its **backward pass**, and a **logit soft-capping** function.\n",
+    "\n",
+    "### Part 1 — Forward\n",
+    "\n",
+    "$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
+    "\n",
+    "```python\n",
+    "def my_tanh(x: torch.Tensor) -> torch.Tensor: ...\n",
+    "```\n",
+    "\n",
+    "### Part 2 — Backward\n",
+    "\n",
+    "The derivative of tanh has an elegant property — it can be expressed in terms of its own output:\n",
+    "\n",
+    "$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
+    "\n",
+    "```python\n",
+    "def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor: ...\n",
+    "```\n",
+    "\n",
+    "### Part 3 — Soft-Capping (Gemma 2)\n",
+    "\n",
+    "Modern models like Gemma 2 use tanh to **soft-cap** logits, smoothly bounding them to $(-\\text{cap}, +\\text{cap})$:\n",
+    "\n",
+    "$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$\n",
+    "\n",
+    "```python\n",
+    "def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor: ...\n",
+    "```\n",
+    "\n",
+    "### Rules\n",
+    "- Do **NOT** use `torch.tanh`, `F.tanh`, `torch.nn.Tanh`, or any built-in tanh\n",
+    "- Must support autograd (gradients should flow through `my_tanh`)\n",
+    "- `tanh_backward` should be a **manual** computation, not using autograd\n",
+    "- `soft_cap_logits` should use your `my_tanh`\n",
+    "\n",
+    "### Example\n",
+    "```\n",
+    "Input:  tensor([-2., -1., 0., 1., 2.])\n",
+    "tanh:   tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
+    "try:\n",
+    "    import google.colab\n",
+    "    get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
+    "except ImportError:\n",
+    "    pass"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ✏️ YOUR IMPLEMENTATION HERE\n",
+    "\n",
+    "def my_tanh(x: torch.Tensor) -> torch.Tensor:\n",
+    "    \"\"\"Part 1: Implement tanh from scratch using exp.\"\"\"\n",
+    "    pass  # Replace this\n",
+    "\n",
+    "\n",
+    "def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n",
+    "    \"\"\"Part 2: Manual backward — compute gradient given upstream grad and tanh output.\"\"\"\n",
+    "    pass  # Replace this\n",
+    "\n",
+    "\n",
+    "def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n",
+    "    \"\"\"Part 3: Soft-cap logits to (-cap, +cap) using your my_tanh.\"\"\"\n",
+    "    pass  # Replace this"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🧪 Debug\n",
+    "x = torch.tensor([-2., -1., 0., 1., 2.])\n",
+    "print('my_tanh:', my_tanh(x))\n",
+    "print('ref:    ', torch.tanh(x))\n",
+    "\n",
+    "# Test backward\n",
+    "t = my_tanh(x)\n",
+    "print('backward:', tanh_backward(torch.ones_like(t), t))\n",
+    "\n",
+    "# Test soft-capping\n",
+    "logits = torch.tensor([-50., 0., 50.])\n",
+    "print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ✅ SUBMIT\n",
+    "from torch_judge import check\n",
+    "check('tanh')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/torch_judge/tasks/tanh.py b/torch_judge/tasks/tanh.py
@@ -0,0 +1,77 @@
+"""Tanh activation, backward, and soft-capping task."""
+
+TASK = {
+    "title": "Tanh, Backward & Soft-Capping",
+    "difficulty": "Easy",
+    "function_name": "my_tanh",
+    "hint": "tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). The derivative is 1 - tanh(x)^2 — note it depends on the *output*, not the input. For soft-capping: cap * tanh(logits / cap).",
+    "tests": [
+        {
+            "name": "Matches torch.tanh",
+            "code": """
+import torch
+torch.manual_seed(0)
+x = torch.randn(4, 8)
+out = {fn}(x)
+ref = torch.tanh(x)
+assert torch.allclose(out, ref, atol=1e-5), f'Does not match torch.tanh'
+""",
+        },
+        {
+            "name": "tanh(0) = 0 and bounded output",
+            "code": """
+import torch
+out_zero = {fn}(torch.tensor([0.0]))
+assert torch.allclose(out_zero, torch.tensor([0.0]), atol=1e-7), f'tanh(0) should be 0, got {out_zero.item()}'
+x_large = torch.tensor([100., -100.])
+out_large = {fn}(x_large)
+assert (out_large.abs() <= 1.0 + 1e-5).all(), f'Output should be bounded in (-1, 1), got {out_large}'
+""",
+        },
+        {
+            "name": "Shape preservation",
+            "code": """
+import torch
+x = torch.randn(2, 3, 4)
+assert {fn}(x).shape == x.shape, 'Shape mismatch'
+""",
+        },
+        {
+            "name": "Gradient flow",
+            "code": """
+import torch
+x = torch.randn(4, 8, requires_grad=True)
+{fn}(x).sum().backward()
+assert x.grad is not None and x.grad.shape == x.shape, 'Gradient issue'
+""",
+        },
+        {
+            "name": "Manual backward (tanh_backward)",
+            "code": """
+import torch
+x = torch.randn(4, 8, requires_grad=True)
+out = {fn}(x)
+out.sum().backward()
+autograd_grad = x.grad.clone()
+
+tanh_out = {fn}(x.detach())
+grad_output = torch.ones_like(tanh_out)
+manual_grad = tanh_backward(grad_output, tanh_out)
+assert torch.allclose(manual_grad, autograd_grad, atol=1e-5), f'tanh_backward does not match autograd'
+""",
+        },
+        {
+            "name": "Soft-capping bounds logits",
+            "code": """
+import torch
+logits = torch.tensor([-50., -10., 0., 10., 50.])
+cap = 30.0
+capped = soft_cap_logits(logits, cap)
+assert (capped.abs() < cap).all(), f'Soft-capped output should be within (-{cap}, {cap}), got {capped}'
+assert torch.allclose(capped[2], torch.tensor(0.0), atol=1e-7), f'soft_cap(0) should be 0, got {capped[2]}'
+ref = cap * torch.tanh(logits / cap)
+assert torch.allclose(capped, ref, atol=1e-5), f'Does not match cap * tanh(logits / cap)'
+""",
+        },
+    ],
+}