A PyTorch extension that replaces numeric matrix multiplication with LLM-powered text computation inside neural network training loops. Each tensor element stores arbitrary text (code, translations, etc.) in files on disk, while numeric coefficients flow through standard autograd. An ExperienceTensor — a key-value store shaped as [N, 3] — serves as the learnable "weight" of the model, starting empty and being built up entirely by a patch-based optimizer across training iterations. The LLM acts as the compute kernel: during forward pass it reads experience entries to produce outputs; during backward pass it computes diffs; during the optimizer step patch applies diffs to experience entries incrementally.
The result is a model that can learn to perform tasks it was never trained on by accumulating experience at runtime, demonstrated by translating Python into Viba — a novel DSL that does not exist in the training corpus of any existing LLM.
experience/
├── symbolic_tensor/
│ ├── function/ # Autograd Functions: st_moe, st_attention, st_stack, slice_*, merge, fork, copy, loss, etc.
│ ├── tensor_util/ # Symbolic tensor primitives: make, slice, assign, diff, patch, dense/sparse
│ ├── module/ # nn.Module wrappers: StMoeModule, WithDenseView
│ ├── optimizer/ # StSGD: dual-channel (numeric + symbolic patch) optimizer
│ ├── data_loader/ # Batch data loading from files
│ └── test/ # Integration tests
├── llm_client/ # LLM backends: raw API (OpenAI-compatible) and coding agent (Claude SDK)
├── sparse_util/ # Sparse coordinate operations (transpose, convert)
├── fs_util/ # File system utilities (directory packing, path enumeration, text merger)
├── test/ # End-to-end tests and benchmarks
└── example/ # Training demos
├── naive_symbolic_transform_model/ # Python-to-Viba translation
└── auto-encoder/ # Auto-encoder baseline stability tests
Gradients propagate through two channels simultaneously:
| Channel | What it carries | How it's computed |
|---|---|---|
| Numeric (coefficient) | Float values (bfloat16) | Standard autograd / SGD arithmetic |
| Symbolic (text) | Unified diffs stored in files | LLM computes diff -u between actual and expected |
The symbolic_grad_registry (thread-local dictionary) passes symbolic gradient metadata between autograd Function backward calls, since PyTorch autograd strips custom tensor attributes (st_relative_to, st_tensor_uid) when propagating gradients between Function nodes.
Two backends are supported:
OpenAI-compatible API (LLM_API_KEY, LLM_BASE_URL, LLM_MODEL). Packs directory contents into a prompt, finds files containing the TODO placeholder, and replaces them with LLM responses. Lightweight, no tool access.
Claude Agent SDK with Read, Edit, Write tool access. The agent can directly read and modify files in the workspace. Best for complex tasks requiring file system interaction.
Both are dispatched through TaskHandler, which takes AgentTask objects and runs them concurrently via asyncio.gather.
An ExperienceTensor is a symbolic tensor of shape [N, 3] where each row is a (query, key, value) triple:
- Query (position 0): Semantic keywords (one per line) used for Jaccard similarity retrieval
- Key (position 1): Source domain content (e.g., Python code)
- Value (position 2): Target domain content (e.g., code in another language)
It acts as the learnable "weight" of the model — starts empty and is populated during training. The backward pass computes diffs against expected output, and the optimizer applies patches to experience entries via patch.
| Function | Purpose |
|---|---|
StMoe (st_moe) |
Mixture-of-Experts: query gen → retrieval → LLM translation → copy-back |
st_attention |
Composes slice_attention + merge into a full attention operation |
slice_attention |
Attention-style forward/backward with causal masks |
st_stack |
Stack symbolic tensors along a new axis (like torch.stack) |
slice_view |
Autograd-aware slice creating symlinked views (shared storage) |
slice_tensor |
Autograd-aware slice creating independent copies |
merge |
Merges symbolic tensor elements along an axis using TextMerger |
ForkTensor |
Replicates input into N identical views; backward merges all grads |
Copy |
Independent copy with autograd support |
GetEditDistanceRatio |
Loss function: Levenshtein edit distance ratio |
dense_to_sparse / sparse_to_dense |
Sparse/dense symbolic tensor conversion pair |
with_dense_view |
Temporarily provides a dense view of sparse tensors |
dense_to_sparse extracts nonzero elements (by torch.nonzero) into a 1D sparse tensor with coordinate indexes. sparse_to_dense reconstructs a dense tensor from sparse representation. They form a forward/backward pair as torch.autograd.Function — each one's backward calls the other.
Forward pass:
- Query Generation: LLM extracts semantic keywords from each input element
- Experience Retrieval: Jaccard similarity (with Gaussian noise for exploration) selects top-k relevant experience entries. Cold-start: returns random indexes when all queries are empty.
- Context Assembly: Dump symlink views of experience and input
- Task Dispatch:
TaskHandlerdispatchesAgentTaskobjects to the LLM backend - Copy-back: Results propagate through symlinks to parent storage
Backward pass computes gradients for both input and experience through numeric + symbolic channels:
- Grad Input: Numeric pass-through. Symbolically, the LLM reads grad_output diffs alongside original input/output/experience and writes improved input files.
- Grad Experience: Forward index list transposed to group by experience entry. Cold-start padding randomly samples empty entries so they still receive gradients. Backward runs twice — once for key, once for value — with domain-specific prompts.
Both slice_view and slice_tensor support autograd:
slice_view: Creates views via symlinks (shared storage). Backward scatters gradients back to original positions.slice_tensor: Creates independent copies. Backward still tracks which input positions contributed to each output.
Both support standard Python indexing syntax:
# Via tensor attributes
row = t.st_view_slicer[0, :] # Symlinked view
col = t.st_value_slicer[:, 1] # Independent copy
# Via function calls
from experience.symbolic_tensor.function.slice_view import slice_view
sub = slice_view(t, [0, slice(None)])| Utility | Purpose |
|---|---|
make_tensor |
Create symbolic tensor from nested lists of strings/Paths |
make_none_tensor |
Create zero-filled tensor with st_relative_to and st_tensor_uid |
none_tensor_like |
Create None-filled tensor matching input shape |
empty_tensor_like |
Create ""-filled tensor matching input shape |
todo_tensor_like |
Create TODO-filled tensor matching input shape |
slice_view |
Create symbolic tensor view via symlinks (shared storage) |
slice_tensor |
Create independent copy (not symlinked) |
assign_tensor |
Assign content from one tensor to another (copy) |
assign_view |
Assign via symlinks (view) |
get_diff_tensor |
Element-wise diff -u between two symbolic tensors |
patch_tensor |
Apply unified diffs via patch CLI (fuzz=3). Cold-start: extracts + lines when target empty. |
dump_tensor / dump_view |
Serialize tensor content to directory |
load_tensor |
Deserialize tensor from directory |
pack_tensor |
Pack tensor into nested list of file contents |
st_patched |
Check if a tensor has been patched |
dense_to_sparse / sparse_to_dense |
Sparse/dense conversion implementations |
register_tensor_ops |
Monkey-patches torch.Tensor with symbolic tensor methods |
register_tensor_ops adds the following methods to torch.Tensor:
| Method | Purpose |
|---|---|
st_pack() |
Pack tensor into nested list of file contents |
st_assign(rvalue) |
Copy assignment |
st_assign_view(rvalue) |
Symlink assignment (view) |
st_get_diff(rvalue) |
Compute unified diff |
st_patch(rvalue) |
Apply patches |
st_file_paths() |
List all storage paths |
st_fork(n) |
Fork into N views |
st_view_slicer[...] |
Pythonic slicing with symlink views |
st_value_slicer[...] |
Pythonic slicing with independent copies |
Two-channel update per step:
- Numeric:
param.data = (1 - lr) * param.data + lr * grad.data - Symbolic: Applies unified diff patches from grad storage to param storage via
patch_tensor(uses thepatchCLI with fuzz=3). Only patches elements wheregrad.data != 0(key+value dims). - Query auto-update: After patching key+value, derives query content by running
get_query_tensoron the updated kv, merging LLM-generated keywords, sorting and deduplicating.
{relative_to}/{tensor_uid}/
├── shape # JSON: [2, 3]
├── storage/
│ ├── 0/data # Element at flat index 0
│ ├── 1/data # Element at flat index 1
│ └── 1/1/data # Multi-digit index 11
This example trains a model from scratch to translate Python code into Viba — a novel domain-specific language invented for this project that does not exist in the training corpus of any existing LLM. The LLM must learn to generate syntactically and semantically correct code in a language it has never seen, purely from the experience entries built up during training.
Existing LLMs have been trained on billions of lines of Python, JavaScript, Haskell, etc. Translating between these tests whether the model can recall patterns from pre-training. Viba eliminates this confound — any correct translation demonstrates genuine generalization driven by the experience mechanism.
Viba is an algebraic data type (ADT) definition language with these core constructs:
name := <type_expr> # type/function definition
<type_expr> := <binding> ... # sequence of bindings
$var <type> # typed variable declaration
<- $var <type> # input binding (argument)
<- $expr # return binding (result)
# inline # inline expansion hint
Type expressions use four ADT combinators:
| Combinator | Syntax | Meaning | Example |
|---|---|---|---|
| Sum | | |
Tagged union (branching) | | 'positive' | 'zero' | 'negative' |
| Product | adjacency / * |
Tuple composition | str * int |
| Exponent | <- |
Function type | int <- int (unary), str <- int <- float (curried) |
| Tag | ` |
Type-level tag/annotation | `JSON |
Additional constructs:
- Match:
Match[$x > 0 -> 'positive', $x == 0 -> 'zero', _ -> 'negative'] - Generics:
list[$elem int] - Dict literal:
dict['first' = $a, 'second' = $b] - Function type value:
(int <- int) - Literals:
INT,FLOAT,BOOLEAN,STRING ("..."),SINGLE_STRING ('...'),TRIPLE_STRING ('''...'''),CODE_BLOCK ({...}) - Special types:
void/()(unit),never(bottom),...(ellipsis) - Tuple syntax:
(A, B)desugars to product type - Import:
Import[path/to/file.viba]references another Viba definition
12 Python-to-Viba translation pairs covering fundamental patterns:
| Pattern | Python | Viba |
|---|---|---|
| Sequential | seq.py |
seq.viba |
| Branching | branch.py |
branch.viba |
| Loop | loop.py |
loop.viba |
| Recursion | recursion.py |
recursion.viba |
| Higher-order | higher_order.py |
higher_order.viba |
| Data structures | data_struct.py |
data_struct.viba |
| Default args | default_arg.py |
default_arg.viba |
| List comprehension | list_comp.py |
list_comp.viba |
| String formatting | format_str.py |
format_str.viba |
| Guard clauses | guard.py |
guard.viba |
| Accumulator | accumulator.py |
accumulator.viba |
| Closure | closure.py |
closure.viba |
import tempfile
from pathlib import Path
from experience.symbolic_tensor.tensor_util.make_tensor import make_tensor
from experience.symbolic_tensor.function.get_edit_distance_ratio import get_edit_distance_ratio
from experience.symbolic_tensor.optimizer.st_sgd import StSGD
from experience.example.naive_symbolic_transform_model.model import NaiveModel
DATASET_PAIRS = [
"seq", "branch", "loop",
"recursion", "higher_order", "data_struct",
"default_arg", "list_comp", "format_str",
"guard", "accumulator", "closure",
]with tempfile.TemporaryDirectory() as tmpdir:
# Input: symlinks to Python source files
py_paths = [Path("dataset") / f"{name}.py" for name in DATASET_PAIRS]
input_tensor = make_tensor(py_paths, tmpdir, symlink=True)
# Expected: Viba code as strings
viba_contents = [(Path("dataset") / f"{name}.viba").read_text() for name in DATASET_PAIRS]
expected_tensor = make_tensor(viba_contents, tmpdir)
# Experience: starts EMPTY — learned during training
n = len(DATASET_PAIRS)
experience_tensor = make_tensor([[""] * 3 for _ in range(n)], tmpdir) model = NaiveModel(task_prompt="Translate Python To Viba", topk=1)
model.load_experience(experience_tensor)
optimizer = StSGD(model.parameters(), lr=1.0)
for iteration in range(1, 6):
optimizer.zero_grad()
# Forward: LLM translates each Python file to Viba using experience
output, selected_indexes = model(input_tensor)
# Loss: Levenshtein edit distance ratio
loss = get_edit_distance_ratio(output, expected_tensor)
mean_loss = loss.mean().item()
print(f"Iteration {iteration} loss: {mean_loss:.4f}")
# Backward: computes symbolic gradients (diffs) via autograd
loss.mean().backward()
# Optimizer step: applies patches to experience via patch
optimizer.step()python -m experience.example.naive_symbolic_transform_model.trainLoss trajectory: ['0.6641', '0.5469', '0.4668', '0.4473', '0.4219'] — converges (36.5% reduction).
Dataset: 12 pairs
Experience: [24, 3] # 24 entries (2x dataset size for exploration headroom)
Iteration 1/5 Mean loss: 0.6641
output[0]: 'fun greet(name: str) -> str:\n let greeting = "Hello"...' # random language
output[1]: 'func classify(x: int) -> str:\n if x > 0:...' # Python-like
Patches: applied=18 rejected=0 fuzzed=0 skipped=0
Iteration 3/5 Mean loss: 0.4668
output[0]: 'greet :=\n $message str\n <- $name str\n ...' # Viba syntax!
output[5]: 'make_pair :=\n dict\n <- $a str\n <- $b str\n ...' # Viba syntax!
Patches: applied=6 rejected=0 fuzzed=0 skipped=0
Iteration 5/5 Mean loss: 0.4219
output[1]: "classify :=\n | 'positive'\n | 'zero'\n | 'negative'..." # exact Viba
output[3]: 'factorial :=\n <- $n int\n # recursive\n <- Match[...]' # exact Viba
output[11]: 'make_adder :=\n $adder (int <- int)\n <- $x int\n ...' # exact Viba
Patch stats (all 5 iterations): 34/34 applied, 0 rejected, 100% success rate.
Key observations:
- The model starts with no Viba knowledge — iteration 1 outputs are in random languages (Go, Rust, TypeScript-like).
- By iteration 3, the LLM begins using Viba syntax (
<-,$var,:=) for some outputs. - By iteration 5, several outputs match expected Viba code exactly (loss < 0.01 for branch, closure, guard).
- Experience entries accumulate correct Python-to-Viba mappings during training — e.g., entry
[22,23]storesclassify(x) -> classify := | 'positive' | .... - All 34 patches applied cleanly with 0% rejection rate across 5 iterations.
The example/auto-encoder/ directory contains a baseline stability test that runs the auto-encoder experiment multiple times to measure variance:
python -m example.auto-encoder.loop_runThis runs 10 experiments and reports mean, min, max, and standard deviation of the loss, useful for establishing baseline metrics and debugging reproducibility issues.
# Unit tests (tensor_util inline tests)
python -m experience.symbolic_tensor.tensor_util.make_tensor
python -m experience.symbolic_tensor.tensor_util.slice_view
python -m experience.symbolic_tensor.tensor_util.patch_tensor
# Integration tests
python -m experience.test.test_gain_st_sgd
python -m experience.test.test_attention_vs_traditional
python -m experience.test.test_st_attention_followed_by_st_moe
# Individual function tests
python -m experience.symbolic_tensor.function.slice_attention_backward
python -m experience.symbolic_tensor.function.st_copy
python -m experience.symbolic_tensor.function.fork_tensor
python -m experience.symbolic_tensor.function.get_edit_distance_ratio
python -m experience.symbolic_tensor.function.st_stack
python -m experience.symbolic_tensor.function.slice_view
python -m experience.symbolic_tensor.function.slice_tensor
# Full training demo
python -m experience.example.naive_symbolic_transform_model.train- LLM as compute kernel: Replaces matrix multiplication with semantic reasoning
- Patch-based Optimizer:
diff/patchfor efficient incremental experience updates - Two LLM backends:
raw_llm_api(default, lightweight) andcoding_agent(tool access) - Symlinks for views, copies for mutations: Shared storage for context, independent copies for LLM writes
- Experience starts empty: Learned entirely at runtime, not pre-seeded
- Cold-start support: Random retrieval and direct
+-line extraction handle empty experience - Append-only experience: Zeros out gradient for non-empty rows so optimizer skips them
- Pythonic slicing:
st_view_slicerandst_value_slicerprovide NumPy-like indexing syntax
- Python 3.13+
- PyTorch
openai(default LLM backend)claude-agent-sdk(alternative LLM backend)Levenshtein(edit distance loss)
pip install torch openai claude-agent-sdk Levenshteinfrom experience.symbolic_tensor import tensor, none
# Create a symbolic tensor
t = tensor(["hello world", "bonjour le monde"], "/tmp/my_tensors")
print(t.shape) # torch.Size([2])
print(t.data) # tensor([1., 1.], dtype=torch.bfloat16)
print(t.st_relative_to) # '/tmp/my_tensors'
print(t.st_tensor_uid) # 'a3f2...'
# Read text content
import os
path = os.path.join(t.st_relative_to, t.st_tensor_uid, "storage", "0", "data")
with open(path) as f:
print(f.read()) # "hello world"
# Tensor ops (registered on torch.Tensor)
diff = t.st_get_diff(expected) # unified diff
t.st_patch(grad) # apply patch
paths = t.st_file_paths() # list all storage paths
t.st_assign(new_value) # copy assignment
t.st_assign_view(new_value) # symlink assignment
forks = t.st_fork(num_outputs=3) # fork into N views
# Pythonic slicing
row_view = t.st_view_slicer[0, :] # symlinked view
row_copy = t.st_value_slicer[0, :] # independent copy
# Stack tensors
from experience.symbolic_tensor.function.st_stack import st_stack
stacked = st_stack([t1, t2, t3], dim=0)