From 0198472dd8fc597cddf797a905cedd677734f5e2 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Thu, 8 Jan 2026 14:05:29 -0600 Subject: [PATCH 1/3] docs: add virtual-schemas specification Add comprehensive specification for virtual schema infrastructure: - Schema-module convention (1:1 mapping) - Schema introspection API (get_table, __getitem__, __iter__) - dj.virtual_schema() function - Table class generation - Use cases and examples Co-Authored-By: Claude Opus 4.5 --- src/reference/specs/index.md | 1 + src/reference/specs/virtual-schemas.md | 299 +++++++++++++++++++++++++ 2 files changed, 300 insertions(+) create mode 100644 src/reference/specs/virtual-schemas.md diff --git a/src/reference/specs/index.md b/src/reference/specs/index.md index b8fe11ff..b9d89634 100644 --- a/src/reference/specs/index.md +++ b/src/reference/specs/index.md @@ -29,6 +29,7 @@ Each specification follows a consistent structure: |---------------|-------------| | [Table Declaration](table-declaration.md) | Table definition syntax, tiers, foreign keys, and indexes | | [Master-Part Relationships](master-part.md) | Compositional data modeling, integrity, and cascading operations | +| [Virtual Schemas](virtual-schemas.md) | Accessing schemas without Python source, introspection API | ### Query Operations diff --git a/src/reference/specs/virtual-schemas.md b/src/reference/specs/virtual-schemas.md new file mode 100644 index 00000000..6aaac910 --- /dev/null +++ b/src/reference/specs/virtual-schemas.md @@ -0,0 +1,299 @@ +# Virtual Schemas Specification + +Version: 1.0 +Status: Stable +Last Updated: 2026-01-08 + +## Overview + +Virtual schemas provide a way to access existing database schemas without the original Python source code. This is useful for: + +- Exploring schemas created by other users +- Accessing legacy schemas +- Quick data inspection and queries +- Schema migration and maintenance + +--- + +## 1. Schema-Module Convention + +DataJoint maintains a **1:1 mapping** between database schemas and Python modules: + +| Database | Python | +|----------|--------| +| Schema | Module | +| Table | Class | + +This convention reduces conceptual complexity: **modules are schemas, classes are tables**. + +When you define tables in Python: +```python +# lab.py module +import datajoint as dj +schema = dj.Schema('lab') + +@schema +class Subject(dj.Manual): # Subject class → `lab`.`subject` table + ... + +@schema +class Session(dj.Manual): # Session class → `lab`.`session` table + ... +``` + +Virtual schemas recreate this mapping when the Python source isn't available: +```python +# Creates module-like object with table classes +lab = dj.virtual_schema('lab') +lab.Subject # Subject class for `lab`.`subject` +lab.Session # Session class for `lab`.`session` +``` + +--- + +## 2. Schema Introspection API + +### 2.1 Direct Table Access + +Access individual tables by name using bracket notation: + +```python +schema = dj.Schema('my_schema') + +# By CamelCase class name +experiment = schema['Experiment'] + +# By snake_case SQL name +experiment = schema['experiment'] + +# Query the table +experiment.fetch() +``` + +### 2.2 `get_table()` Method + +Explicit method for table access: + +```python +table = schema.get_table('Experiment') +table = schema.get_table('experiment') # also works +``` + +**Parameters:** +- `name` (str): Table name in CamelCase or snake_case + +**Returns:** `FreeTable` instance + +**Raises:** `DataJointError` if table doesn't exist + +### 2.3 Iteration + +Iterate over all tables in dependency order: + +```python +for table in schema: + print(table.full_table_name, len(table)) +``` + +Tables are yielded as `FreeTable` instances in topological order (dependencies before dependents). + +### 2.4 Containment Check + +Check if a table exists: + +```python +if 'Experiment' in schema: + print("Table exists") + +if 'nonexistent' not in schema: + print("Table doesn't exist") +``` + +--- + +## 3. Virtual Schema Function + +### 3.1 `dj.virtual_schema()` + +The recommended way to access existing schemas as modules: + +```python +lab = dj.virtual_schema('my_lab_schema') + +# Access tables as attributes (classes) +lab.Subject.fetch() +lab.Session & 'subject_id="M001"' + +# Full query algebra supported +(lab.Session * lab.Subject).fetch() +``` + +This maintains the module-class convention: `lab` behaves like a Python module with table classes as attributes. + +**Parameters:** + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `schema_name` | str | required | Database schema name | +| `connection` | Connection | None | Database connection (uses default) | +| `create_schema` | bool | False | Create schema if missing | +| `create_tables` | bool | False | Allow new table declarations | +| `add_objects` | dict | None | Additional objects for namespace | + +**Returns:** `VirtualModule` instance + +### 3.2 VirtualModule Class + +The underlying class (prefer `virtual_schema()` function): + +```python +module = dj.VirtualModule('lab', 'my_lab_schema') +module.Subject.fetch() +``` + +The first argument is the module display name, second is the schema name. + +### 3.3 Accessing the Schema Object + +Virtual modules expose the underlying Schema: + +```python +lab = dj.virtual_schema('my_lab_schema') +lab.schema.database # 'my_lab_schema' +lab.schema.list_tables() # ['subject', 'session', ...] +``` + +--- + +## 4. Table Class Generation + +### 4.1 `spawn_missing_classes()` + +Create Python classes for all tables in a schema: + +```python +schema = dj.Schema('existing_schema') +schema.spawn_missing_classes(context=locals()) + +# Now table classes are available in local namespace +Subject.fetch() +Session & 'date > "2024-01-01"' +``` + +**Parameters:** +- `context` (dict): Namespace to populate. Defaults to caller's locals. + +### 4.2 Generated Class Types + +Classes are created based on table naming conventions: + +| Table Name Pattern | Generated Class | +|-------------------|-----------------| +| `subject` | `dj.Manual` | +| `#lookup_table` | `dj.Lookup` | +| `_imported_table` | `dj.Imported` | +| `__computed_table` | `dj.Computed` | +| `master__part` | `dj.Part` | + +### 4.3 Part Table Handling + +Part tables are attached to their master classes: + +```python +lab = dj.virtual_schema('my_lab') + +# Part tables are nested attributes +lab.Session.Trial.fetch() # Session.Trial is a Part table +``` + +--- + +## 5. Use Cases + +### 5.1 Data Exploration + +```python +# Quick exploration of unknown schema +lab = dj.virtual_schema('collaborator_lab') + +# List all tables +print(lab.schema.list_tables()) + +# Check table structure +print(lab.Subject.describe()) + +# Preview data +lab.Subject.fetch(limit=5) +``` + +### 5.2 Cross-Schema Queries + +```python +my_schema = dj.Schema('my_analysis') +external = dj.virtual_schema('external_lab') + +# Reference external tables in queries +@my_schema +class Analysis(dj.Computed): + definition = """ + -> external.Session + --- + result : float + """ +``` + +### 5.3 Schema Migration + +```python +old = dj.virtual_schema('old_schema') +new = dj.Schema('new_schema') + +# Copy data +for table in old: + new_table = new.get_table(table.table_name) + new_table.insert(table.fetch(as_dict=True)) +``` + +### 5.4 Garbage Collection + +```python +from datajoint.gc import scan_content_references + +schema = dj.Schema('my_schema') + +# Scan all tables for content references +refs = scan_content_references(schema, verbose=True) +``` + +--- + +## 6. Comparison of Methods + +| Method | Use Case | Returns | +|--------|----------|---------| +| `schema['Name']` | Quick single table access | `FreeTable` | +| `schema.get_table('name')` | Explicit table access | `FreeTable` | +| `for t in schema` | Iterate all tables | `FreeTable` generator | +| `'Name' in schema` | Check existence | `bool` | +| `dj.virtual_schema(name)` | Module-like access | `VirtualModule` | +| `spawn_missing_classes()` | Populate namespace | None (side effect) | + +--- + +## 7. Implementation Reference + +| File | Purpose | +|------|---------| +| `schemas.py` | Schema, VirtualModule, virtual_schema | +| `table.py` | FreeTable class | +| `gc.py` | Uses get_table() for scanning | + +--- + +## 8. Error Messages + +| Error | Cause | Solution | +|-------|-------|----------| +| "Table does not exist" | `get_table()` on missing table | Check table name spelling | +| "Schema must be activated" | Operations on unactivated schema | Call `schema.activate(name)` | +| "Schema does not exist" | Schema name not in database | Check schema name, create if needed | From c229598daaf19ce24dd10ec1f507f0a8c5a24b3e Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Thu, 8 Jan 2026 16:19:54 -0600 Subject: [PATCH 2/3] docs: add CLI how-to guide - Create use-cli.md with comprehensive CLI documentation - Cover database credentials, schema loading, common workflows - Add to how-to index under Setup section Co-Authored-By: Claude Opus 4.5 --- src/how-to/index.md | 1 + src/how-to/use-cli.md | 138 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) create mode 100644 src/how-to/use-cli.md diff --git a/src/how-to/index.md b/src/how-to/index.md index 19a1477f..e89dab02 100644 --- a/src/how-to/index.md +++ b/src/how-to/index.md @@ -10,6 +10,7 @@ they assume you understand the basics and focus on getting things done. - [Installation](installation.md) — Installing DataJoint - [Configure Database Connection](configure-database.md) — Connection settings - [Configure Object Storage](configure-storage.md) — S3, MinIO, file stores +- [Use the Command-Line Interface](use-cli.md) — Interactive REPL ## Schema Design diff --git a/src/how-to/use-cli.md b/src/how-to/use-cli.md new file mode 100644 index 00000000..ee080851 --- /dev/null +++ b/src/how-to/use-cli.md @@ -0,0 +1,138 @@ +# Use the Command-Line Interface + +Start an interactive Python REPL with DataJoint pre-loaded. + +The `dj` command provides quick access to DataJoint for exploring schemas, running queries, and testing connections without writing scripts. + +## Start the REPL + +```bash +dj +``` + +This opens a Python REPL with `dj` (DataJoint) already imported: + +``` +DataJoint 2.0.0 REPL +Type 'dj.' and press Tab for available functions. + +>>> dj.conn() # Connect to database +>>> dj.list_schemas() # List available schemas +``` + +## Specify Database Credentials + +Override config file settings from the command line: + +```bash +dj --host localhost:3306 --user root --password secret +``` + +| Option | Description | +|--------|-------------| +| `--host HOST` | Database host as `host:port` | +| `-u`, `--user USER` | Database username | +| `-p`, `--password PASS` | Database password | + +Credentials from command-line arguments override values in config files. + +## Load Schemas as Virtual Modules + +Load database schemas directly into the REPL namespace: + +```bash +dj -s my_lab:lab -s my_analysis:analysis +``` + +The format is `schema_name:alias` where: +- `schema_name` is the database schema name +- `alias` is the variable name in the REPL + +This outputs: + +``` +DataJoint 2.0.0 REPL +Type 'dj.' and press Tab for available functions. + +Loaded schemas: + lab -> my_lab + analysis -> my_analysis + +>>> lab.Subject.to_dicts() # Query Subject table +>>> dj.Diagram(lab.schema) # View schema diagram +``` + +## Common Workflows + +### Explore an Existing Schema + +```bash +dj -s production_db:db +``` + +```python +>>> list(db.schema) # List all tables +>>> db.Experiment().to_dicts()[:5] # Preview data +>>> dj.Diagram(db.schema) # Visualize structure +``` + +### Quick Data Check + +```bash +dj --host db.example.com -s my_lab:lab +``` + +```python +>>> len(lab.Session()) # Count sessions +>>> lab.Session.describe() # Show table definition +``` + +### Test Connection + +```bash +dj --host localhost:3306 --user testuser --password testpass +``` + +```python +>>> dj.conn() # Verify connection works +>>> dj.list_schemas() # Check accessible schemas +``` + +## Version Information + +Display DataJoint version: + +```bash +dj --version +``` + +## Help + +Display all options: + +```bash +dj --help +``` + +## Entry Points + +The CLI is available as both `dj` and `datajoint`: + +```bash +dj --version +datajoint --version # Same command +``` + +## Programmatic Usage + +The CLI function can also be called from Python: + +```python +from datajoint.cli import cli + +# Show version and exit +cli(["--version"]) + +# Start REPL with schemas +cli(["-s", "my_lab:lab"]) +``` From f7353e329f3980be1e7e1e1a8dd28e2e9957ac38 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Thu, 8 Jan 2026 23:30:31 -0600 Subject: [PATCH 3/3] docs: add Connection Lifecycle section to configure-database Document persistent connection (singleton) vs context manager patterns for different use cases (interactive vs serverless). Co-Authored-By: Claude Opus 4.5 --- src/how-to/configure-database.md | 57 ++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/src/how-to/configure-database.md b/src/how-to/configure-database.md index 0a230f15..11245378 100644 --- a/src/how-to/configure-database.md +++ b/src/how-to/configure-database.md @@ -122,3 +122,60 @@ For local development without TLS: } ``` +## Connection Lifecycle + +### Persistent Connection (Default) + +DataJoint uses a persistent singleton connection by default: + +```python +import datajoint as dj + +# First call establishes connection +conn = dj.conn() + +# Subsequent calls return the same connection +conn2 = dj.conn() # Same as conn + +# Reset to create a new connection +conn3 = dj.conn(reset=True) # New connection +``` + +This is ideal for interactive sessions and notebooks. + +### Context Manager (Explicit Cleanup) + +For serverless environments (AWS Lambda, Cloud Functions) or when you need explicit connection lifecycle control, use the context manager: + +```python +import datajoint as dj + +with dj.Connection(host, user, password) as conn: + schema = dj.schema('my_schema', connection=conn) + MyTable().insert(data) +# Connection automatically closed when exiting the block +``` + +The connection closes automatically even if an exception occurs: + +```python +try: + with dj.Connection(**creds) as conn: + schema = dj.schema('my_schema', connection=conn) + MyTable().insert(data) + raise SomeError() +except SomeError: + pass +# Connection is still closed properly +``` + +### Manual Close + +You can also close a connection explicitly: + +```python +conn = dj.conn() +# ... do work ... +conn.close() +``` +