diff --git a/README.md b/README.md
index e2b566b..af88402 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ Practice implementing operators and architectures from scratch — the exact ski
[](https://github.com/duoan/TorchCode)
[](https://ghcr.io/duoan/torchcode)
[](https://huggingface.co/spaces/duoan/TorchCode)
-
+

[](https://star-history.com/#duoan/TorchCode&Date)
@@ -44,7 +44,7 @@ TorchCode gives you a **structured practice environment** with:
| | Feature | |
|---|---|---|
-| 🧩 | **40 curated problems** | The most frequently asked PyTorch interview topics |
+| 🧩 | **41 curated problems** | The most frequently asked PyTorch interview topics |
| ⚖️ | **Automated judge** | Correctness checks, gradient verification, and timing |
| 🎨 | **Instant feedback** | Colored pass/fail per test case, just like competitive programming |
| 💡 | **Hints when stuck** | Nudges without full spoilers |
@@ -114,6 +114,7 @@ The bread and butter of ML coding interviews. You'll be asked to write these wit
| 17 | Dropout
| `MyDropout` (nn.Module) |  | 🔥 | Train/eval mode, inverted scaling |
| 18 | Embedding
| `MyEmbedding` (nn.Module) |  | 🔥 | Lookup table, `weight[indices]` |
| 19 | GELU
| `my_gelu(x)` |  | ⭐ | Gaussian error linear unit, `torch.erf` |
+| 41 | Tanh
| `my_tanh(x)`, `tanh_backward(...)`, `soft_cap_logits(...)` |  | ⭐ | Activation functions, backprop, logit soft-capping |
| 20 | Kaiming Init
| `kaiming_init(weight)` |  | ⭐ | `std = sqrt(2/fan_in)`, variance scaling |
| 21 | Gradient Clipping
| `clip_grad_norm(params, max_norm)` |  | ⭐ | Norm-based clipping, direction preservation |
| 31 | Gradient Accumulation
| `accumulated_step(model, opt, ...)` |  | 💡 | Micro-batching, loss scaling |
diff --git a/solutions/41_tanh_solution.ipynb b/solutions/41_tanh_solution.ipynb
new file mode 100644
index 0000000..2c97e24
--- /dev/null
+++ b/solutions/41_tanh_solution.ipynb
@@ -0,0 +1,83 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "[](https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/41_tanh_solution.ipynb)\n\n",
+ "# Solution: Tanh, Backward & Soft-Capping\n",
+ "\n",
+ "$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
+ "\n",
+ "$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
+ "\n",
+ "$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
+ "try:\n",
+ " import google.colab\n",
+ " get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
+ "except ImportError:\n",
+ " pass"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import torch"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "# ✅ SOLUTION\n\ndef my_tanh(x: torch.Tensor) -> torch.Tensor:\n # Equivalent to (e^x - e^-x)/(e^x + e^-x), but numerically stable\n # Divide numerator & denominator by e^x → 2·sigmoid(2x) - 1\n return 2.0 / (1.0 + torch.exp(-2.0 * x)) - 1.0\n\n\ndef tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n return grad_output * (1 - tanh_output ** 2)\n\n\ndef soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n return cap * my_tanh(logits / cap)"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Verify\n",
+ "x = torch.tensor([-2., -1., 0., 1., 2.])\n",
+ "print('my_tanh:', my_tanh(x))\n",
+ "print('ref: ', torch.tanh(x))\n",
+ "\n",
+ "t = my_tanh(x)\n",
+ "print('backward:', tanh_backward(torch.ones_like(t), t))\n",
+ "\n",
+ "logits = torch.tensor([-50., 0., 50.])\n",
+ "print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))\n",
+ "\n",
+ "# Run judge\n",
+ "from torch_judge import check\n",
+ "check('tanh')"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/templates/41_tanh.ipynb b/templates/41_tanh.ipynb
new file mode 100644
index 0000000..82356dc
--- /dev/null
+++ b/templates/41_tanh.ipynb
@@ -0,0 +1,144 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "[](https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_tanh.ipynb)\n\n",
+ "# 🟢 Easy: Tanh, Backward & Soft-Capping\n",
+ "\n",
+ "Implement the **tanh** activation function, its **backward pass**, and a **logit soft-capping** function.\n",
+ "\n",
+ "### Part 1 — Forward\n",
+ "\n",
+ "$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
+ "\n",
+ "```python\n",
+ "def my_tanh(x: torch.Tensor) -> torch.Tensor: ...\n",
+ "```\n",
+ "\n",
+ "### Part 2 — Backward\n",
+ "\n",
+ "The derivative of tanh has an elegant property — it can be expressed in terms of its own output:\n",
+ "\n",
+ "$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
+ "\n",
+ "```python\n",
+ "def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor: ...\n",
+ "```\n",
+ "\n",
+ "### Part 3 — Soft-Capping (Gemma 2)\n",
+ "\n",
+ "Modern models like Gemma 2 use tanh to **soft-cap** logits, smoothly bounding them to $(-\\text{cap}, +\\text{cap})$:\n",
+ "\n",
+ "$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$\n",
+ "\n",
+ "```python\n",
+ "def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor: ...\n",
+ "```\n",
+ "\n",
+ "### Rules\n",
+ "- Do **NOT** use `torch.tanh`, `F.tanh`, `torch.nn.Tanh`, or any built-in tanh\n",
+ "- Must support autograd (gradients should flow through `my_tanh`)\n",
+ "- `tanh_backward` should be a **manual** computation, not using autograd\n",
+ "- `soft_cap_logits` should use your `my_tanh`\n",
+ "\n",
+ "### Example\n",
+ "```\n",
+ "Input: tensor([-2., -1., 0., 1., 2.])\n",
+ "tanh: tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
+ "try:\n",
+ " import google.colab\n",
+ " get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
+ "except ImportError:\n",
+ " pass"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import torch"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ✏️ YOUR IMPLEMENTATION HERE\n",
+ "\n",
+ "def my_tanh(x: torch.Tensor) -> torch.Tensor:\n",
+ " \"\"\"Part 1: Implement tanh from scratch using exp.\"\"\"\n",
+ " pass # Replace this\n",
+ "\n",
+ "\n",
+ "def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n",
+ " \"\"\"Part 2: Manual backward — compute gradient given upstream grad and tanh output.\"\"\"\n",
+ " pass # Replace this\n",
+ "\n",
+ "\n",
+ "def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n",
+ " \"\"\"Part 3: Soft-cap logits to (-cap, +cap) using your my_tanh.\"\"\"\n",
+ " pass # Replace this"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 🧪 Debug\n",
+ "x = torch.tensor([-2., -1., 0., 1., 2.])\n",
+ "print('my_tanh:', my_tanh(x))\n",
+ "print('ref: ', torch.tanh(x))\n",
+ "\n",
+ "# Test backward\n",
+ "t = my_tanh(x)\n",
+ "print('backward:', tanh_backward(torch.ones_like(t), t))\n",
+ "\n",
+ "# Test soft-capping\n",
+ "logits = torch.tensor([-50., 0., 50.])\n",
+ "print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ✅ SUBMIT\n",
+ "from torch_judge import check\n",
+ "check('tanh')"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/torch_judge/tasks/tanh.py b/torch_judge/tasks/tanh.py
new file mode 100644
index 0000000..506ecba
--- /dev/null
+++ b/torch_judge/tasks/tanh.py
@@ -0,0 +1,77 @@
+"""Tanh activation, backward, and soft-capping task."""
+
+TASK = {
+ "title": "Tanh, Backward & Soft-Capping",
+ "difficulty": "Easy",
+ "function_name": "my_tanh",
+ "hint": "tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). The derivative is 1 - tanh(x)^2 — note it depends on the *output*, not the input. For soft-capping: cap * tanh(logits / cap).",
+ "tests": [
+ {
+ "name": "Matches torch.tanh",
+ "code": """
+import torch
+torch.manual_seed(0)
+x = torch.randn(4, 8)
+out = {fn}(x)
+ref = torch.tanh(x)
+assert torch.allclose(out, ref, atol=1e-5), f'Does not match torch.tanh'
+""",
+ },
+ {
+ "name": "tanh(0) = 0 and bounded output",
+ "code": """
+import torch
+out_zero = {fn}(torch.tensor([0.0]))
+assert torch.allclose(out_zero, torch.tensor([0.0]), atol=1e-7), f'tanh(0) should be 0, got {out_zero.item()}'
+x_large = torch.tensor([100., -100.])
+out_large = {fn}(x_large)
+assert (out_large.abs() <= 1.0 + 1e-5).all(), f'Output should be bounded in (-1, 1), got {out_large}'
+""",
+ },
+ {
+ "name": "Shape preservation",
+ "code": """
+import torch
+x = torch.randn(2, 3, 4)
+assert {fn}(x).shape == x.shape, 'Shape mismatch'
+""",
+ },
+ {
+ "name": "Gradient flow",
+ "code": """
+import torch
+x = torch.randn(4, 8, requires_grad=True)
+{fn}(x).sum().backward()
+assert x.grad is not None and x.grad.shape == x.shape, 'Gradient issue'
+""",
+ },
+ {
+ "name": "Manual backward (tanh_backward)",
+ "code": """
+import torch
+x = torch.randn(4, 8, requires_grad=True)
+out = {fn}(x)
+out.sum().backward()
+autograd_grad = x.grad.clone()
+
+tanh_out = {fn}(x.detach())
+grad_output = torch.ones_like(tanh_out)
+manual_grad = tanh_backward(grad_output, tanh_out)
+assert torch.allclose(manual_grad, autograd_grad, atol=1e-5), f'tanh_backward does not match autograd'
+""",
+ },
+ {
+ "name": "Soft-capping bounds logits",
+ "code": """
+import torch
+logits = torch.tensor([-50., -10., 0., 10., 50.])
+cap = 30.0
+capped = soft_cap_logits(logits, cap)
+assert (capped.abs() < cap).all(), f'Soft-capped output should be within (-{cap}, {cap}), got {capped}'
+assert torch.allclose(capped[2], torch.tensor(0.0), atol=1e-7), f'soft_cap(0) should be 0, got {capped[2]}'
+ref = cap * torch.tanh(logits / cap)
+assert torch.allclose(capped, ref, atol=1e-5), f'Does not match cap * tanh(logits / cap)'
+""",
+ },
+ ],
+}