Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Practice implementing operators and architectures from scratch — the exact ski
[![GitHub stars](https://img.shields.io/github/stars/duoan/TorchCode?style=social)](https://github.com/duoan/TorchCode)
[![GitHub Container Registry](https://img.shields.io/badge/ghcr.io-TorchCode-blue?style=flat-square&logo=github)](https://ghcr.io/duoan/torchcode)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-TorchCode-blue?style=flat-square)](https://huggingface.co/spaces/duoan/TorchCode)
![Problems](https://img.shields.io/badge/problems-40-orange?style=flat-square)
![Problems](https://img.shields.io/badge/problems-41-orange?style=flat-square)
![GPU](https://img.shields.io/badge/GPU-not%20required-brightgreen?style=flat-square)

[![Star History Chart](https://api.star-history.com/svg?repos=duoan/TorchCode&type=Date)](https://star-history.com/#duoan/TorchCode&Date)
Expand All @@ -44,7 +44,7 @@ TorchCode gives you a **structured practice environment** with:

| | Feature | |
|---|---|---|
| 🧩 | **40 curated problems** | The most frequently asked PyTorch interview topics |
| 🧩 | **41 curated problems** | The most frequently asked PyTorch interview topics |
| ⚖️ | **Automated judge** | Correctness checks, gradient verification, and timing |
| 🎨 | **Instant feedback** | Colored pass/fail per test case, just like competitive programming |
| 💡 | **Hints when stuck** | Nudges without full spoilers |
Expand Down Expand Up @@ -114,6 +114,7 @@ The bread and butter of ML coding interviews. You'll be asked to write these wit
| 17 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/17_dropout.ipynb" target="_blank">Dropout</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/17_dropout.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `MyDropout` (nn.Module) | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 🔥 | Train/eval mode, inverted scaling |
| 18 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/18_embedding.ipynb" target="_blank">Embedding</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/18_embedding.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `MyEmbedding` (nn.Module) | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 🔥 | Lookup table, `weight[indices]` |
| 19 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/19_gelu.ipynb" target="_blank">GELU</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/19_gelu.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `my_gelu(x)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Gaussian error linear unit, `torch.erf` |
| 41 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/41_tanh.ipynb" target="_blank">Tanh</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_tanh.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `my_tanh(x)`, `tanh_backward(...)`, `soft_cap_logits(...)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Activation functions, backprop, logit soft-capping |
| 20 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/20_weight_init.ipynb" target="_blank">Kaiming Init</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/20_weight_init.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `kaiming_init(weight)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | `std = sqrt(2/fan_in)`, variance scaling |
| 21 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/21_gradient_clipping.ipynb" target="_blank">Gradient Clipping</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/21_gradient_clipping.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `clip_grad_norm(params, max_norm)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | ⭐ | Norm-based clipping, direction preservation |
| 31 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/31_gradient_accumulation.ipynb" target="_blank">Gradient Accumulation</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/31_gradient_accumulation.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `accumulated_step(model, opt, ...)` | ![Easy](https://img.shields.io/badge/Easy-4CAF50?style=flat-square) | 💡 | Micro-batching, loss scaling |
Expand Down
83 changes: 83 additions & 0 deletions solutions/41_tanh_solution.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/41_tanh_solution.ipynb)\n\n",
"# Solution: Tanh, Backward & Soft-Capping\n",
"\n",
"$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
"\n",
"$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
"\n",
"$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
"try:\n",
" import google.colab\n",
" get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
"except ImportError:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# ✅ SOLUTION\n\ndef my_tanh(x: torch.Tensor) -> torch.Tensor:\n # Equivalent to (e^x - e^-x)/(e^x + e^-x), but numerically stable\n # Divide numerator & denominator by e^x → 2·sigmoid(2x) - 1\n return 2.0 / (1.0 + torch.exp(-2.0 * x)) - 1.0\n\n\ndef tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n return grad_output * (1 - tanh_output ** 2)\n\n\ndef soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n return cap * my_tanh(logits / cap)"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify\n",
"x = torch.tensor([-2., -1., 0., 1., 2.])\n",
"print('my_tanh:', my_tanh(x))\n",
"print('ref: ', torch.tanh(x))\n",
"\n",
"t = my_tanh(x)\n",
"print('backward:', tanh_backward(torch.ones_like(t), t))\n",
"\n",
"logits = torch.tensor([-50., 0., 50.])\n",
"print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))\n",
"\n",
"# Run judge\n",
"from torch_judge import check\n",
"check('tanh')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
144 changes: 144 additions & 0 deletions templates/41_tanh.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_tanh.ipynb)\n\n",
"# 🟢 Easy: Tanh, Backward & Soft-Capping\n",
"\n",
"Implement the **tanh** activation function, its **backward pass**, and a **logit soft-capping** function.\n",
"\n",
"### Part 1 — Forward\n",
"\n",
"$$\\text{tanh}(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}}$$\n",
"\n",
"```python\n",
"def my_tanh(x: torch.Tensor) -> torch.Tensor: ...\n",
"```\n",
"\n",
"### Part 2 — Backward\n",
"\n",
"The derivative of tanh has an elegant property — it can be expressed in terms of its own output:\n",
"\n",
"$$\\frac{d}{dx}\\text{tanh}(x) = 1 - \\text{tanh}^2(x)$$\n",
"\n",
"```python\n",
"def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor: ...\n",
"```\n",
"\n",
"### Part 3 — Soft-Capping (Gemma 2)\n",
"\n",
"Modern models like Gemma 2 use tanh to **soft-cap** logits, smoothly bounding them to $(-\\text{cap}, +\\text{cap})$:\n",
"\n",
"$$\\text{soft\\_cap}(x) = \\text{cap} \\cdot \\text{tanh}\\left(\\frac{x}{\\text{cap}}\\right)$$\n",
"\n",
"```python\n",
"def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor: ...\n",
"```\n",
"\n",
"### Rules\n",
"- Do **NOT** use `torch.tanh`, `F.tanh`, `torch.nn.Tanh`, or any built-in tanh\n",
"- Must support autograd (gradients should flow through `my_tanh`)\n",
"- `tanh_backward` should be a **manual** computation, not using autograd\n",
"- `soft_cap_logits` should use your `my_tanh`\n",
"\n",
"### Example\n",
"```\n",
"Input: tensor([-2., -1., 0., 1., 2.])\n",
"tanh: tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
"try:\n",
" import google.colab\n",
" get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
"except ImportError:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ✏️ YOUR IMPLEMENTATION HERE\n",
"\n",
"def my_tanh(x: torch.Tensor) -> torch.Tensor:\n",
" \"\"\"Part 1: Implement tanh from scratch using exp.\"\"\"\n",
" pass # Replace this\n",
"\n",
"\n",
"def tanh_backward(grad_output: torch.Tensor, tanh_output: torch.Tensor) -> torch.Tensor:\n",
" \"\"\"Part 2: Manual backward — compute gradient given upstream grad and tanh output.\"\"\"\n",
" pass # Replace this\n",
"\n",
"\n",
"def soft_cap_logits(logits: torch.Tensor, cap: float) -> torch.Tensor:\n",
" \"\"\"Part 3: Soft-cap logits to (-cap, +cap) using your my_tanh.\"\"\"\n",
" pass # Replace this"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 🧪 Debug\n",
"x = torch.tensor([-2., -1., 0., 1., 2.])\n",
"print('my_tanh:', my_tanh(x))\n",
"print('ref: ', torch.tanh(x))\n",
"\n",
"# Test backward\n",
"t = my_tanh(x)\n",
"print('backward:', tanh_backward(torch.ones_like(t), t))\n",
"\n",
"# Test soft-capping\n",
"logits = torch.tensor([-50., 0., 50.])\n",
"print('soft_cap(cap=30):', soft_cap_logits(logits, 30.0))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ✅ SUBMIT\n",
"from torch_judge import check\n",
"check('tanh')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
77 changes: 77 additions & 0 deletions torch_judge/tasks/tanh.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""Tanh activation, backward, and soft-capping task."""

TASK = {
"title": "Tanh, Backward & Soft-Capping",
"difficulty": "Easy",
"function_name": "my_tanh",
"hint": "tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)). The derivative is 1 - tanh(x)^2 — note it depends on the *output*, not the input. For soft-capping: cap * tanh(logits / cap).",
"tests": [
{
"name": "Matches torch.tanh",
"code": """
import torch
torch.manual_seed(0)
x = torch.randn(4, 8)
out = {fn}(x)
ref = torch.tanh(x)
assert torch.allclose(out, ref, atol=1e-5), f'Does not match torch.tanh'
""",
},
{
"name": "tanh(0) = 0 and bounded output",
"code": """
import torch
out_zero = {fn}(torch.tensor([0.0]))
assert torch.allclose(out_zero, torch.tensor([0.0]), atol=1e-7), f'tanh(0) should be 0, got {out_zero.item()}'
x_large = torch.tensor([100., -100.])
out_large = {fn}(x_large)
assert (out_large.abs() <= 1.0 + 1e-5).all(), f'Output should be bounded in (-1, 1), got {out_large}'
""",
},
{
"name": "Shape preservation",
"code": """
import torch
x = torch.randn(2, 3, 4)
assert {fn}(x).shape == x.shape, 'Shape mismatch'
""",
},
{
"name": "Gradient flow",
"code": """
import torch
x = torch.randn(4, 8, requires_grad=True)
{fn}(x).sum().backward()
assert x.grad is not None and x.grad.shape == x.shape, 'Gradient issue'
""",
},
{
"name": "Manual backward (tanh_backward)",
"code": """
import torch
x = torch.randn(4, 8, requires_grad=True)
out = {fn}(x)
out.sum().backward()
autograd_grad = x.grad.clone()

tanh_out = {fn}(x.detach())
grad_output = torch.ones_like(tanh_out)
manual_grad = tanh_backward(grad_output, tanh_out)
assert torch.allclose(manual_grad, autograd_grad, atol=1e-5), f'tanh_backward does not match autograd'
""",
},
{
"name": "Soft-capping bounds logits",
"code": """
import torch
logits = torch.tensor([-50., -10., 0., 10., 50.])
cap = 30.0
capped = soft_cap_logits(logits, cap)
assert (capped.abs() < cap).all(), f'Soft-capped output should be within (-{cap}, {cap}), got {capped}'
assert torch.allclose(capped[2], torch.tensor(0.0), atol=1e-7), f'soft_cap(0) should be 0, got {capped[2]}'
ref = cap * torch.tanh(logits / cap)
assert torch.allclose(capped, ref, atol=1e-5), f'Does not match cap * tanh(logits / cap)'
""",
},
],
}
Loading