Skip to content
Open
5 changes: 3 additions & 2 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ nav:
- explanation/index.md
- Overview:
- Data Pipelines: explanation/data-pipelines.md
- What's New in 2.0: explanation/whats-new-2.md
- What's New in 2.2: explanation/whats-new-22.md
- FAQ: explanation/faq.md
- Data Model:
- Relational Workflow Model: explanation/relational-workflow-model.md
Expand Down Expand Up @@ -127,6 +125,9 @@ nav:
- API: api/ # Auto-generated via gen-files + literate-nav
- About:
- about/index.md
- What's New in 2.2: about/whats-new-22.md
- What's New in 2.1: about/whats-new-21.md
- What's New in 2.0: about/whats-new-2.md
- History: about/history.md
- Documentation Versioning: about/versioning.md
- Platform: https://www.datajoint.com/sign-up
Expand Down
2 changes: 1 addition & 1 deletion src/.overrides/partials/announce.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% if config.extra.datajoint_version %}
<a href="{{ 'explanation/whats-new-2/' | url }}">
<a href="{{ 'about/whats-new-2/' | url }}">
Documentation for DataJoint {{ config.extra.datajoint_version }}
</a>
{% endif %}
2 changes: 1 addition & 1 deletion src/about/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ print(dj.__version__)

If you're upgrading from legacy DataJoint (pre-2.0):

1. **Review** the [What's New in 2.0](../explanation/whats-new-2.md) page to understand major changes
1. **Review** the [What's New in 2.0](whats-new-2.md) page to understand major changes
2. **Follow** the [Migration Guide](../how-to/migrate-to-v20.md) for step-by-step upgrade instructions
3. **Reference** this documentation for updated syntax and APIs

Expand Down
26 changes: 9 additions & 17 deletions src/explanation/whats-new-2.md → src/about/whats-new-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,20 +274,12 @@ Most users complete Phases 1-2 in a single session. Phases 3-4 only apply if you

## See Also

### Migration
- **[Migration Guide](../how-to/migrate-to-v20.md/)** — Complete upgrade instructions
- [Configuration](../how-to/configure-database.md/) — Setup new configuration system

### Core Concepts
- [Type System](type-system.md) — Understand the three-tier type architecture
- [Computation Model](computation-model.md) — Jobs 2.0 and AutoPopulate
- [Query Algebra](query-algebra.md) — Semantic matching and operators

### Getting Started
- [Installation](../how-to/installation.md/) — Install DataJoint 2.0
- [Tutorials](../tutorials/index.md/) — Learn by example

### Reference
- [Type System Specification](../reference/specs/type-system.md/) — Complete type system details
- [Codec API](../reference/specs/codec-api.md/) — Build custom codecs
- [AutoPopulate Specification](../reference/specs/autopopulate.md/) — Jobs 2.0 reference
- [What's New in 2.1](whats-new-21.md) — Next release
- [Release Notes (v2.0.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.0.0) — GitHub changelog
- **[Migration Guide](../how-to/migrate-to-v20.md)** — Complete upgrade instructions
- [Configuration](../how-to/configure-database.md) — Setup new configuration system
- [Type System](../explanation/type-system.md) — Understand the three-tier type architecture
- [Computation Model](../explanation/computation-model.md) — Jobs 2.0 and AutoPopulate
- [Query Algebra](../explanation/query-algebra.md) — Semantic matching and operators
- [Installation](../how-to/installation.md) — Install DataJoint 2.0
- [Tutorials](../tutorials/index.md) — Learn by example
125 changes: 125 additions & 0 deletions src/about/whats-new-21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# What's New in DataJoint 2.1

DataJoint 2.1 adds **PostgreSQL as a production backend**, **enhanced diagram visualization**, and **singleton tables**.

> **Upgrading from 2.0?** No breaking changes. All existing code continues to work. New features are purely additive.

> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)

## PostgreSQL Backend

DataJoint now supports PostgreSQL 15+ as a production database backend alongside MySQL 8+. The adapter architecture generates backend-specific SQL while maintaining a consistent API — the same table definitions, queries, and pipeline logic work on both backends.

```bash
export DJ_BACKEND=postgresql
export DJ_HOST=localhost
export DJ_PORT=5432
```

Or configure programmatically:

```python
dj.config['database.backend'] = 'postgresql'
```

All core types (`int32`, `float64`, `varchar`, `uuid`, `json`), codec types (`<blob>`, `<attach>`, `<object@>`), query operations, foreign keys, indexes, and auto-populate work identically across backends. Backend-specific differences are handled internally by the adapter layer.

See [Database Backends](../reference/specs/database-backends.md) for the full specification.

## Diagram Enhancements

`dj.Diagram` gains several visualization features for working with complex, multi-schema pipelines.

### Layout Direction

Control the flow direction of diagrams:

```python
# Horizontal layout
dj.config.display.diagram_direction = "LR"

# Or temporarily
with dj.config.override(display__diagram_direction="LR"):
dj.Diagram(schema).draw()
```

| Value | Description |
|-------|-------------|
| `"TB"` | Top to bottom (default) |
| `"LR"` | Left to right |

### Mermaid Output

Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown, GitHub, or web documentation:

```python
print(dj.Diagram(schema).make_mermaid())
```

Save directly to `.mmd` or `.mermaid` files:

```python
dj.Diagram(schema).save("pipeline.mmd")
```

### Schema Grouping

Multi-schema diagrams automatically group tables into visual clusters by database schema. The cluster label shows the Python module name when available, following the DataJoint convention of one module per schema.

```python
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
combined.draw() # tables grouped by schema
```

### Collapsing Schemas

For high-level pipeline views, collapse entire schemas into single nodes:

```python
# Show schema1 expanded, schema2 as a single node with table count
dj.Diagram(schema1) + dj.Diagram(schema2).collapse()
```

The **"expanded wins" rule** applies: if a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows showing specific tables while collapsing the rest:

```python
# Subject is expanded, rest of analysis schema is collapsed
dj.Diagram(Subject) + dj.Diagram(analysis).collapse()
```

See [Diagram Specification](../reference/specs/diagram.md) for the full reference.

## Singleton Tables

A **singleton table** holds at most one row. Declare it with no attributes in the primary key section:

```python
@schema
class Config(dj.Lookup):
definition = """
# Global configuration
---
setting1 : varchar(100)
setting2 : int32
"""
```

| Operation | Result |
|-----------|--------|
| Insert | Works without specifying a key |
| Second insert | Raises `DuplicateError` |
| `fetch1()` | Returns the single row |

Useful for global configuration, pipeline parameters, and summary statistics.

See [Table Declaration](../reference/specs/table-declaration.md#25-singleton-tables-empty-primary-keys) for details.

## See Also

- [Database Backends](../reference/specs/database-backends.md) — Full backend specification
- [Diagram Specification](../reference/specs/diagram.md) — Diagram reference
- [Table Declaration](../reference/specs/table-declaration.md) — Singleton tables
- [Configure Database](../how-to/configure-database.md) — Connection setup for both backends
- [What's New in 2.0](whats-new-2.md) — Previous release
- [What's New in 2.2](whats-new-22.md) — Next release
- [Release Notes (v2.1.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.1.0) — GitHub changelog
64 changes: 57 additions & 7 deletions src/explanation/whats-new-22.md → src/about/whats-new-22.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ In prior versions, `dj.Diagram` existed solely for visualization — drawing the
- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.

In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `counts()` methods are available as a public inspection API for understanding cascade impact before executing.

### The Preview-Then-Execute Pattern

Expand All @@ -225,7 +225,7 @@ diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
counts = restricted.counts()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute via Table.delete() after reviewing the blast radius
Expand All @@ -238,11 +238,11 @@ This is valuable when working with unfamiliar pipelines, large datasets, or mult

The diagram supports two restriction propagation modes designed for fundamentally different tasks.

**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `counts()` or `delete()`.

When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.

**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `counts()` to inspect the result.

The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.

Expand All @@ -256,15 +256,63 @@ export = (dj.Diagram(schema)
.restrict(Session & 'session_date > "2024-01-01"')
.prune())

export.preview() # only tables with matching rows
export.counts() # only tables with matching rows
export # visualize the export subgraph
```

Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.

### Restriction Propagation Rules

When `cascade()` or `restrict()` propagates a restriction from a parent to a child, one of three rules applies depending on the foreign key relationship:

| Rule | Condition | Child restriction |
|------|-----------|-------------------|
| **Direct copy** | Non-aliased FK, restriction attributes are a subset of child's primary key | Restriction copied directly |
| **Aliased projection** | FK uses attribute renaming (e.g., `subject_id` → `animal_id`) | Parent projected with attribute mapping |
| **Full projection** | Non-aliased FK, restriction uses attributes not in child's primary key | Parent projected (all attributes) as restriction |

When a child has multiple restricted ancestors, convergence depends on the mode: `cascade()` uses OR (any path marks a row for deletion), `restrict()` uses AND (all conditions must match).

When a child references the same parent through multiple foreign keys (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`), these paths always combine with OR regardless of the mode — each FK path is an independent reason for the child row to be affected.

### Dry Run

`Table.delete()` and `Table.drop()` accept a `dry_run` parameter that returns affected row counts without modifying data:

```python
# Preview what would be deleted
(Session & {'subject_id': 'M001'}).delete(dry_run=True)
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Preview what would be dropped
Session.drop(dry_run=True)
# {'`lab`.`session`': 100, '`lab`.`trial`': 5000}
```

### Unloaded Schema Detection

If a descendant table lives in a schema that hasn't been activated, the graph-driven delete won't know about it. When the final `DELETE` fails with a foreign key error, DataJoint catches it and produces an actionable error message identifying which schema needs to be activated — rather than the opaque crash of the prior implementation.

### Iteration API

Diagrams support Python's iteration protocol, yielding `FreeTable` objects in topological order:

```python
# Forward iteration (parents first) — useful for export/inspection
for ft in diagram:
print(ft.full_table_name, len(ft))

# Reverse iteration (leaves first) — used by delete and drop
for ft in reversed(diagram):
ft.delete_quick()
```

Each yielded `FreeTable` carries any cascade or restrict conditions that have been applied. `Table.delete()` and `Table.drop()` use `reversed(diagram)` internally, replacing the manual `topo_sort()` loops from prior implementations.

### Architecture

`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then iterates `reversed(diagram)` to delete leaves first. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `counts()` and iteration, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.

### Advantages over Error-Driven Cascade

Expand All @@ -278,10 +326,12 @@ The graph-driven approach resolves every known limitation of the prior error-dri
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
| Reusability | Delete-only | Delete, drop, export, prune |
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
| Inspectability | Opaque recursive cascade | `counts()` / `dry_run` before executing |

## See Also

- [What's New in 2.1](whats-new-21.md) — Previous release
- [Release Notes (v2.2.0)](https://github.com/datajoint/datajoint-python/releases) — GitHub changelog
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
Expand Down
2 changes: 1 addition & 1 deletion src/explanation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ and scalable.

How DataJoint ensures safe joins through attribute lineage tracking.

- :material-new-box: **[What's New in 2.0](whats-new-2.md)**
- :material-new-box: **[What's New in 2.0](../about/whats-new-2.md)**

Major changes, new features, and migration guidance for DataJoint 2.0.

Expand Down
2 changes: 1 addition & 1 deletion src/explanation/type-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ result = np.mean(ref) # Downloads automatically
Schema-addressed storage for files and folders. Path mirrors the database structure: `{schema}/{table}/{pk}/{attribute}`.

```python
class ProcessedData(dj.Computed):
class RecordingAnalysis(dj.Computed):
definition = """
-> Recording
---
Expand Down
4 changes: 2 additions & 2 deletions src/how-to/alter-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,10 +199,10 @@ For tables created before enabling job metadata:
from datajoint.migrate import add_job_metadata_columns

# Dry run
add_job_metadata_columns(ProcessedData, dry_run=True)
add_job_metadata_columns(SessionAnalysis, dry_run=True)

# Apply
add_job_metadata_columns(ProcessedData, dry_run=False)
add_job_metadata_columns(SessionAnalysis, dry_run=False)
```

## Best Practices
Expand Down
6 changes: 3 additions & 3 deletions src/how-to/delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ with dj.conn().transaction:
Session.Trial.insert(corrected_trials)

# 3. Recompute derived data
ProcessedData.populate()
SessionAnalysis.populate()
```

This ensures all derived data remains consistent with source data.
Expand Down Expand Up @@ -212,7 +212,7 @@ diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# 2. Preview: see affected tables and row counts
counts = restricted.preview()
counts = restricted.counts()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# 3. Visualize the cascade subgraph (in Jupyter)
Expand All @@ -226,7 +226,7 @@ restricted

- **Preview blast radius**: Understand what a cascade delete will affect before committing
- **Multi-schema inspection**: Build a diagram spanning multiple schemas to visualize cascade impact
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows
- **Programmatic control**: Use `counts()` return values to make decisions in automated workflows

For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram API is for when you need more visibility before executing.

Expand Down
Loading
Loading