Skip to content

[RFC]: Pluggable Secret Management Architecture & Context Binding #39

@rosspeili

Description

@rosspeili

Summary

Propose an architecture to deprecate the global os.environ dependency for skill secrets by introducing a "Secret Provider" interface and an execution Context. This allows the Skillware core to support Enterprise-grade Key Management Systems (KMS), Cloud Vaults, and dynamic credentials injected at runtime, while maintaining .env support for solo developers via a default local provider.

Motivation

Currently, Skillware uses skillware/core/env.py to load .env variables directly into os.environ. While this works for solo developers running local environments, it is fundamentally incompatible with Enterprise deployments and Production AI Agents.

Primary Issues Solved:

  1. Global Leakage: Pushing highly sensitive API keys into os.environ exposes them to every library, telemetry script, and execution thread running in the Python process.
  2. Multi-Tenancy: Autonomous agents processing tasks for multiple users cannot easily switch global environment variables without thread collisions.
  3. Enterprise Non-Starter: Production systems use AWS KMS, HashiCorp Vault, Azure Key Vault, or Kubernetes Secrets. By hardcoding to .env/os.environ, we prevent easy adoption by security-conscious organizations.

Detailed Design

We introduce a structured "Dependency Injection" pattern for Secrets. The change encompasses three parts:

A. Pluggable Secret Providers

Introduce an Abstract Base Class for Secret Providers that handles key retrieval securely.

from abc import ABC, abstractmethod
import os

class BaseSecretProvider(ABC):
    @abstractmethod
    def get_secret(self, key: str) -> str:
         pass

class LocalSecretProvider(BaseSecretProvider):
    # Standard behavior today (reads from .env / os.environ)
    def get_secret(self, key: str) -> str:
        return os.environ.get(key)

class HashiCorpVaultProvider(BaseSecretProvider):
    # Enterprise behavior (reads dynamically from a cloud vault at execution time)
    def get_secret(self, key: str) -> str:
        # fetch from KMS or Vault
        pass

The framework consumer will instantiate the loader with their provider of choice:
SkillLoader(secret_manager=HashiCorpVaultProvider())
(Defaults to LocalSecretProvider if omitted).

B. Manifest Declarations

Skills must declare exactly which secrets they require to function, rather than failing silently or insecurely deep inside execute():

# In manifest.yaml
secrets:
  - STRIPE_API_KEY
  - TWILIO_AUTH_TOKEN

C. Execution Context

We modify the BaseSkill execution signature. When the core SkillLoader initializes or runs a skill, it fetches the strictly defined keys via the active SecretProvider and passes them securely in a context object, completely bypassing the global os.environ.

# new signature for skill.py
def execute(self, params: Dict[str, Any], context: ExecutionContext) -> Dict[str, Any]:
    api_key = context.secrets.get("STRIPE_API_KEY")
    # ... executes securely without leaking state

D. Scope of Required Updates

Transitioning to this security model will require touching several elements across the framework:

  1. Core Overhaul:

    • skillware/core/base_skill.py: Update the execute method signature.
    • skillware/core/loader.py: Add the initialization of SecretProvider and context binding during load.
    • skillware/core/env.py: Refactor to serve primarily as the backend for LocalSecretProvider.
  2. Template Updates:

    • templates/python_skill/skill.py: Update the boilerplate signature.
    • templates/python_skill/manifest.yaml: Add the empty secrets: [] array.
    • templates/python_skill/test_skill.py: Update the mock tests to pass a dummy context object instead of just params.
  3. Documentation Updates:

    • README.md: Update the Configuration and Quick Start sections to reflect how to define or swap Secret Providers.
    • CONTRIBUTING.md: Remove rules specifically pointing to os.environ and replace them with the context.secrets standard.
    • docs/TESTING.md: Add a section explaining how to mock the ExecutionContext safely for unit tests.
  4. Skill Refactoring:

    • Every existing skill under skills/ (e.g., wallet_screening, pii_masker, mica_module) must have its manifest.yaml, skill.py, and test_skill.py updated to adhere to the new standards.

Drawbacks

4. Drawbacks

- **Breaking Change**: Changing the signature of `execute(self, params)` to `execute(self, params, context)` breaks backward compatibility for custom/community skills currently using `os.environ`. 
- **Migration Effort**: All existing templates and core skills (e.g., `wallet_screening`, `pii_masker`) will need to be refactored to check `context.secrets` instead of importing `os`.
- *Mitigation strategy*: We could temporarily proxy requested `.yaml` secrets into `os.environ` strictly during the `execute` block and delete them right after to maintain backward compatibility, though this is less secure than native Context injection.

Metadata

Metadata

Assignees

Labels

core frameworkChanges to loader, env, or base classes.discussionOpen discussion for RFCs and proposals.help wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions