Skip to content

Roadmap: Multi-Runtime Architecture & Ephemeral Mode #1

@yasha-dev1

Description

@yasha-dev1

PyWorkflow Architecture Evolution

This issue outlines the roadmap to transform PyWorkflow into a flexible framework supporting multiple execution environments and durability modes.

Terminology

Term Definition
Durable Workflow with event sourcing, persistence, replay capability, fault-tolerant
Transient Simple task execution without persistence overhead

Vision

┌─────────────────────────────────────────────────────────────┐
│                      User Code                              │
│              @workflow / @step / sleep() / hooks            │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Runtime Layer                            │
├─────────────┬─────────────┬─────────────┬───────────────────┤
│   Local     │   Celery    │   AWS       │   AWS Durable     │
│  (in-proc)  │  (workers)  │   Lambda    │   Lambda          │
└─────────────┴─────────────┴─────────────┴───────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Durability                               │
├─────────────────────────────┬───────────────────────────────┤
│   Transient (durable=False) │   Durable (durable=True)      │
│   No storage, no replay     │   Event-sourced, resumable    │
└─────────────────────────────┴───────────────────────────────┘
                                              │
                              ┌───────────────┼───────────────┐
                              ▼               ▼               ▼
                          ┌────────┐    ┌─────────┐    ┌──────────┐
                          │In-Mem  │    │ SQLite  │    │ DynamoDB │
                          └────────┘    └─────────┘    └──────────┘

API Design

Configuration

import pyworkflow

pyworkflow.configure(
    # Infrastructure (app-level only)
    storage=SQLiteStorage("./workflows.db"),
    celery_broker="redis://localhost:6379",
    aws_region="us-east-1",
    
    # Defaults (can be overridden per-workflow)
    default_runtime="local",
    default_durable=False,
    default_retries=3,
)

Workflow Definition

# Uses configured defaults
@workflow
async def simple_task():
    ...

# Explicitly durable
@workflow(durable=True)
async def payment_flow(order_id: str):
    payment = await charge_card(order_id)
    await sleep("1h")  # Suspends, resumes later
    await send_receipt(payment)

# Explicitly transient (no persistence)
@workflow(durable=False)
async def quick_task():
    ...

# Specify runtime
@workflow(runtime="celery")
async def distributed_task():
    ...

# Celery workers but transient (no persistence)
@workflow(runtime="celery", durable=False)
async def distributed_transient_task():
    ...

# AWS Lambda (transient only)
@workflow(runtime="lambda")
async def serverless_task():
    ...

# AWS Durable Lambda (durable only)
@workflow(runtime="durable-lambda")
async def long_running_serverless():
    await sleep("2h")
    ...

Steps

# Steps inherit durability from workflow
@step
async def process_data():
    ...

# Configurable retries
@step(retries=3, retry_delay="5s")
async def call_external_api():
    ...

# No retries - fail immediately
@step(retries=0)
async def validate_input(data):
    ...

Runtime + Durability Validation

Runtime durable=True durable=False
local ✅ Valid ✅ Valid (default)
celery ✅ Valid (default) ✅ Valid
lambda ❌ Error ✅ Valid (only option)
durable-lambda ✅ Valid (only option) ❌ Error

Error messages:

  • @workflow(runtime="lambda", durable=True)"Lambda runtime cannot be durable. Use 'durable-lambda' for durable serverless workflows."
  • @workflow(runtime="durable-lambda", durable=False)"Durable Lambda runtime is always durable. Use 'lambda' for transient serverless tasks."

Implementation Phases

Phase 1: Core Abstractions (P0)

  • Runtime Abstraction Layer - Pluggable Runtime interface
  • Durable/Transient Mode - durable boolean flag, conditional event recording
  • Configurable Retry Behavior - Per-step retry configuration

Phase 2: Runtime Implementations (P1)

  • Local Runtime - In-process execution for CI/dev
  • Refactor Celery as Runtime - Make Celery pluggable, not hardcoded
  • In-Memory Storage - Lightweight storage for tests

Phase 3: Persistence (P2)

  • SQLite Storage Backend - Local persistence
  • Unified Configuration System - pyworkflow.configure() API

Phase 4: AWS Integration (P3)

  • AWS Lambda Runtime - Transient serverless tasks
  • AWS Durable Lambda Runtime - Long-running durable workflows
  • DynamoDB Storage Backend - AWS-native storage
  • Lambda Deployment CLI - pyworkflow deploy command

Phase 5: Future (P4)

  • AWS Step Functions (if demand warrants)
  • Azure Functions

Files to Create/Modify

pyworkflow/
├── runtime/                 # NEW: Runtime abstraction
│   ├── __init__.py
│   ├── base.py              # Runtime ABC
│   ├── local.py             # Local runtime
│   ├── celery.py            # Celery runtime (moved)
│   └── aws/
│       ├── lambda_runtime.py
│       ├── durable.py
│       └── handler.py
├── storage/
│   ├── memory.py            # NEW: In-memory backend
│   ├── sqlite.py            # NEW: SQLite backend
│   └── dynamodb.py          # NEW: DynamoDB backend
├── config.py                # NEW: Unified configuration
├── cli/
│   └── deploy.py            # NEW: Deployment CLI
└── core/
    ├── context.py           # MODIFY: Optional storage
    ├── workflow.py          # MODIFY: durable flag support
    └── step.py              # MODIFY: Configurable retries

Key Design Decisions

1. Terminology

  • Durable: Event-sourced, persistent, resumable workflows
  • Transient: Simple task execution without persistence overhead

2. API Philosophy

  • Boolean durable flag - clear and unambiguous
  • Smart defaults - minimal boilerplate for common cases
  • Runtime validation - catch invalid combinations early with helpful errors

3. Why Durable Lambda over Step Functions?

  • Same mental model as PyWorkflow (checkpoint/replay)
  • Code-first (no YAML state machines)
  • Lower cost (pay for compute, not state transitions)
  • Simpler migration path

References


This is a tracking issue. Individual features will be broken into separate issues as implementation begins.

Metadata

Metadata

Assignees

Labels

P0Critical: Blocking issue requiring immediate attentionepicLarge initiative containing multiple related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions