-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
P0Critical: Blocking issue requiring immediate attentionCritical: Blocking issue requiring immediate attentionepicLarge initiative containing multiple related issuesLarge initiative containing multiple related issues
Description
PyWorkflow Architecture Evolution
This issue outlines the roadmap to transform PyWorkflow into a flexible framework supporting multiple execution environments and durability modes.
Terminology
| Term | Definition |
|---|---|
| Durable | Workflow with event sourcing, persistence, replay capability, fault-tolerant |
| Transient | Simple task execution without persistence overhead |
Vision
┌─────────────────────────────────────────────────────────────┐
│ User Code │
│ @workflow / @step / sleep() / hooks │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Runtime Layer │
├─────────────┬─────────────┬─────────────┬───────────────────┤
│ Local │ Celery │ AWS │ AWS Durable │
│ (in-proc) │ (workers) │ Lambda │ Lambda │
└─────────────┴─────────────┴─────────────┴───────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Durability │
├─────────────────────────────┬───────────────────────────────┤
│ Transient (durable=False) │ Durable (durable=True) │
│ No storage, no replay │ Event-sourced, resumable │
└─────────────────────────────┴───────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌────────┐ ┌─────────┐ ┌──────────┐
│In-Mem │ │ SQLite │ │ DynamoDB │
└────────┘ └─────────┘ └──────────┘
API Design
Configuration
import pyworkflow
pyworkflow.configure(
# Infrastructure (app-level only)
storage=SQLiteStorage("./workflows.db"),
celery_broker="redis://localhost:6379",
aws_region="us-east-1",
# Defaults (can be overridden per-workflow)
default_runtime="local",
default_durable=False,
default_retries=3,
)Workflow Definition
# Uses configured defaults
@workflow
async def simple_task():
...
# Explicitly durable
@workflow(durable=True)
async def payment_flow(order_id: str):
payment = await charge_card(order_id)
await sleep("1h") # Suspends, resumes later
await send_receipt(payment)
# Explicitly transient (no persistence)
@workflow(durable=False)
async def quick_task():
...
# Specify runtime
@workflow(runtime="celery")
async def distributed_task():
...
# Celery workers but transient (no persistence)
@workflow(runtime="celery", durable=False)
async def distributed_transient_task():
...
# AWS Lambda (transient only)
@workflow(runtime="lambda")
async def serverless_task():
...
# AWS Durable Lambda (durable only)
@workflow(runtime="durable-lambda")
async def long_running_serverless():
await sleep("2h")
...Steps
# Steps inherit durability from workflow
@step
async def process_data():
...
# Configurable retries
@step(retries=3, retry_delay="5s")
async def call_external_api():
...
# No retries - fail immediately
@step(retries=0)
async def validate_input(data):
...Runtime + Durability Validation
| Runtime | durable=True |
durable=False |
|---|---|---|
local |
✅ Valid | ✅ Valid (default) |
celery |
✅ Valid (default) | ✅ Valid |
lambda |
❌ Error | ✅ Valid (only option) |
durable-lambda |
✅ Valid (only option) | ❌ Error |
Error messages:
@workflow(runtime="lambda", durable=True)→"Lambda runtime cannot be durable. Use 'durable-lambda' for durable serverless workflows."@workflow(runtime="durable-lambda", durable=False)→"Durable Lambda runtime is always durable. Use 'lambda' for transient serverless tasks."
Implementation Phases
Phase 1: Core Abstractions (P0)
- Runtime Abstraction Layer - Pluggable
Runtimeinterface - Durable/Transient Mode -
durableboolean flag, conditional event recording - Configurable Retry Behavior - Per-step retry configuration
Phase 2: Runtime Implementations (P1)
- Local Runtime - In-process execution for CI/dev
- Refactor Celery as Runtime - Make Celery pluggable, not hardcoded
- In-Memory Storage - Lightweight storage for tests
Phase 3: Persistence (P2)
- SQLite Storage Backend - Local persistence
- Unified Configuration System -
pyworkflow.configure()API
Phase 4: AWS Integration (P3)
- AWS Lambda Runtime - Transient serverless tasks
- AWS Durable Lambda Runtime - Long-running durable workflows
- DynamoDB Storage Backend - AWS-native storage
- Lambda Deployment CLI -
pyworkflow deploycommand
Phase 5: Future (P4)
- AWS Step Functions (if demand warrants)
- Azure Functions
Files to Create/Modify
pyworkflow/
├── runtime/ # NEW: Runtime abstraction
│ ├── __init__.py
│ ├── base.py # Runtime ABC
│ ├── local.py # Local runtime
│ ├── celery.py # Celery runtime (moved)
│ └── aws/
│ ├── lambda_runtime.py
│ ├── durable.py
│ └── handler.py
├── storage/
│ ├── memory.py # NEW: In-memory backend
│ ├── sqlite.py # NEW: SQLite backend
│ └── dynamodb.py # NEW: DynamoDB backend
├── config.py # NEW: Unified configuration
├── cli/
│ └── deploy.py # NEW: Deployment CLI
└── core/
├── context.py # MODIFY: Optional storage
├── workflow.py # MODIFY: durable flag support
└── step.py # MODIFY: Configurable retries
Key Design Decisions
1. Terminology
- Durable: Event-sourced, persistent, resumable workflows
- Transient: Simple task execution without persistence overhead
2. API Philosophy
- Boolean
durableflag - clear and unambiguous - Smart defaults - minimal boilerplate for common cases
- Runtime validation - catch invalid combinations early with helpful errors
3. Why Durable Lambda over Step Functions?
- Same mental model as PyWorkflow (checkpoint/replay)
- Code-first (no YAML state machines)
- Lower cost (pay for compute, not state transitions)
- Simpler migration path
References
This is a tracking issue. Individual features will be broken into separate issues as implementation begins.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P0Critical: Blocking issue requiring immediate attentionCritical: Blocking issue requiring immediate attentionepicLarge initiative containing multiple related issuesLarge initiative containing multiple related issues