Skip to content

Add initial thoughts on deployment#8

Closed
jackgerrits wants to merge 6 commits intomainfrom
deployment
Closed

Add initial thoughts on deployment#8
jackgerrits wants to merge 6 commits intomainfrom
deployment

Conversation

@jackgerrits
Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings May 14, 2025 20:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an initial deployment guide outlining how to register and run agents, manage their state, test locally, and containerize the app.

  • Introduces code examples for setting up an agent app and registering agents
  • Describes stateful REST routes and local testing via LocalAgentRuntime
  • Provides guidance on containerizing the app with Docker

Comment thread docs/design/deployment.md Outdated
Comment thread docs/design/deployment.md Outdated
jackgerrits and others added 3 commits May 14, 2025 16:08
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread docs/design/deployment.md
runtime = LocalAgentRuntime()

agent = MyAgent("my_id", runtime)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine the agent is then invoked using something like agent.run(task="what is the weather in new york").

Comment thread docs/design/deployment.md

## Containerization

Since the app's interface essentially the `AgentRuntime` and the entrypoint, you can containerize the app using a Dockerfile. The entrypoint would be the command to run the agentrunner with the appropriate arguments, or a custom entrypoint can be injected at runtime which uses a different `AgentRuntime` implementation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an agent is deployed via the runtime, does it expose a REST API that application can call?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want to support other protocols than REST, e.g. gRPC?

Comment thread docs/design/deployment.md
agentrunner --entrypoint myapp:app --state-dir /tmp/agents
```

This would bring up a process and expose some interface that would make this agent accessible. Depending on how `agentrunner` is invoked, perhaps different interfaces are exposed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expect A2A to be one of the interfaces we can run through agentrunner? And other one would be something like a ChatCompletion or Responses API?

Comment thread docs/design/deployment.md

## Testing locally

If you wish to use the agent without hosting it within agentrunner, you can use a concrete AgentRuntime implementation to run the agent locally. For example, you can use the `LocalAgentRuntime` class to run an agent in a local process.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the LocalAgentRuntime intended to be a dev/test-only thing, or is will there be a way to specify the runtime when invoking agentunner?

Comment thread docs/design/deployment.md

## State

State is associated with an agent `id`. The agent id is based on how the agentrunner is invoked. For example, there may be a REST server exposed with the following route:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still using AgentId ~= type + key?

Especially if we tie state to it, we should see if it is possible to ensure that we don't have conflicting agents registered under the same type: In distributed case we might simultaneously desire to have multiple nodes host agents of the same type, but we don't want a case wherein depending on which node a message gets sent the resulting observable behaviour would be different

Comment thread docs/design/deployment.md
You might then be able to execute this app like:

```sh
agentrunner --entrypoint myapp:app --state-dir /tmp/agents
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Would we support the "main" entry point for the app too, e.g. --entrypoint myapp

@eavanvalkenburg eavanvalkenburg added the documentation Improvements or additions to documentation label Jul 30, 2025
@crickman crickman closed this Aug 11, 2025
@crickman crickman deleted the deployment branch August 19, 2025 17:06
onetoomanybi referenced this pull request in onetoomanybi/agent-framework Oct 18, 2025
…sing)

- Task 1.1: Add import guard for notebookutils (dual-environment support)
- Task 1.2: Implement DefaultAzureCredential pattern (modern auth)
- Task 1.3: Replace naive CSV parsing with pandas (robust data handling)
- Task 1.4: Add pagination parameters to list_lakehouse_files (scalability)

Gap Coverage:
- Gap #1: Product Confusion (Azure AI Projects SDK verified)
- Gap #2: Deprecated Patterns (DefaultAzureCredential + endpoint)
- Gap #3: No Hardcoded Secrets (FabricMLCredential pattern)
- Gap #6: Token Flow (import guard + mock credentials)
- Gap #8: Resilience (pagination + pandas parsing)

Validation: 7/7 Phase 1 checks passing, 11/15 overall (73%)
onetoomanybi referenced this pull request in onetoomanybi/agent-framework Oct 18, 2025
PHASE 4 - Resilience & Monitoring (3 checks):
- Task 4.1: Add retry logic with exponential backoff and timeout (Gap #8)
- Task 4.2: Implement comprehensive logging with request IDs (Gap microsoft#10)
- Task 4.3: Confirm GUID-based resource identification (Gap microsoft#12)

PHASE 5 - Pre-Deployment (2 checks):
- Task 5.1: Add OpenTelemetry/Azure Monitor packages (Gap microsoft#10)
- Task 5.2: Verify Azure AI Projects SDK is in requirements (Gap #1)

Implementation Details:
- Added retry_with_timeout decorator with exponential backoff
- Added logging.basicConfig() and logger for observability
- Added request_id (UUID) to all operations for tracing
- Added logging.info() calls for major operations
- Added logging.debug() calls for detailed troubleshooting
- Added logging.error() with exc_info for exception tracking
- Added OpenTelemetry and azure-monitor-opentelemetry to requirements.txt
- Updated docstrings with GUID examples

Validation Results: 15/15 checks passing (100%)
- Phase 1: 7/7 ✓
- Phase 3: 3/3 ✓
- Phase 4: 3/3 ✓
- Phase 5: 2/2 ✓
github-merge-queue Bot pushed a commit that referenced this pull request Apr 10, 2026
* Harden Python checkpoint persistence defaults

Add RestrictedUnpickler to _checkpoint_encoding.py that limits which
types may be instantiated during pickle deserialization.  By default
FileCheckpointStorage now uses the restricted unpickler, allowing only:

- Built-in Python value types (primitives, datetime, uuid, decimal,
  collections, etc.)
- All agent_framework.* internal types
- Additional types specified via the new allowed_checkpoint_types
  parameter on FileCheckpointStorage

This narrows the default type surface area for persisted checkpoints
while keeping framework-owned scenarios working without extra
configuration.  Developers can extend the allowed set by passing
"module:qualname" strings to allowed_checkpoint_types.

The decode_checkpoint_value function retains backward-compatible
unrestricted behavior when called without the new allowed_types kwarg.

Fixes #4894

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: resolve mypy no-any-return error in checkpoint encoding

Add explicit type annotation for super().find_class() return value
to satisfy mypy's no-any-return check.

Fixes #4894

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify find_class return in _RestrictedUnpickler (#4894)

Remove unnecessary intermediate variable and apply # noqa: S301 # nosec
directly on the super().find_class() call, matching the established
pattern used on the pickle.loads() call in the same file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #4894: Python: Harden Python checkpoint persistence defaults

* Restore # noqa: S301 on line 102 of _checkpoint_encoding.py (#4894)

The review feedback correctly identified that removing the # noqa: S301
suppression from the find_class return statement would cause a ruff S301
lint failure, since the project enables bandit ("S") rules. This
restores consistency with lines 82 and 246 in the same file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #4894: Python: Harden Python checkpoint persistence defaults

* Address PR review comments on checkpoint encoding (#4894)

- Move module docstring to proper position after __future__ import
- Fix find_class return type annotation to type[Any]
- Add missing # noqa: S301 pragma on find_class return
- Improve error message to reference both allowed_types param and
  FileCheckpointStorage.allowed_checkpoint_types
- Add -> None return annotation to FileCheckpointStorage.__init__
- Replace tempfile.mktemp with TemporaryDirectory in test
- Replace contextlib.suppress with pytest.raises for precise assertion
- Remove unused contextlib import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #4941 review comments: fix docstring position and return type

- Move module docstring before 'from __future__' import so it populates
  __doc__ (comment #4)
- Change find_class return annotation from type[Any] to type to avoid
  misleading callers about non-type returns like copyreg._reconstructor
  (comment #2)

Comments #1, #3, #5, #6, #7, #8 were already addressed in the current code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #4894: review comment fixes

* fix: use pickle.UnpicklingError in RestrictedUnpickler and improve docstring (#4894)

- Change _RestrictedUnpickler.find_class to raise pickle.UnpicklingError
  instead of WorkflowCheckpointException, since it is pickle-level concern
  that gets wrapped by the caller in _base64_to_unpickle.
- Remove now-unnecessary WorkflowCheckpointException re-raise in
  _base64_to_unpickle (pickle.UnpicklingError is caught by the generic
  except Exception handler and wrapped).
- Expand decode_checkpoint_value docstring to show a concrete example of
  the module:qualname format with a user-defined class.
- Add regression test verifying find_class raises pickle.UnpicklingError.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: address PR #4941 review comments for checkpoint encoding

- Comment 1 (line 103): Already resolved in prior commit — _RestrictedUnpickler
  now raises pickle.UnpicklingError instead of WorkflowCheckpointException.

- Comment 2 (line 140): Add concrete usage examples to decode_checkpoint_value
  docstring showing both direct allowed_types usage and FileCheckpointStorage
  allowed_checkpoint_types usage. Rename 'SafeState' to 'MyState' across all
  docstrings for consistency, making it clear this is a user-defined class name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace deprecated 'builtin' repo with pre-commit-hooks in pre-commit config

pre-commit 4.x no longer supports 'repo: builtin'. Merge those hooks into
the existing pre-commit-hooks repo entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: apply pyupgrade formatting to docstring example

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: resolve pre-commit hook paths for monorepo git root

The poe-check and bandit hooks referenced paths relative to python/
but pre-commit runs hooks from the git root (monorepo root). Fix
poe-check entry to cd into python/ first, and update bandit config
path to python/pyproject.toml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix pre-commit config paths for prek --cd python execution

Revert bandit config path from 'python/pyproject.toml' to 'pyproject.toml'
and poe-check entry from explicit 'cd python' wrapper to direct invocation,
since prek --cd python already sets the working directory to python/.

Also apply ruff formatting fixes to cosmos checkpoint storage files.

Fixes #4894

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: add builtins:getattr to checkpoint deserialization allowlist

Pickle uses builtins:getattr to reconstruct enum members (e.g.,
WorkflowMessage.type which is a MessageType enum). Without it in the
allowlist, checkpoint roundtrip tests fail with
WorkflowCheckpointException.

Fixes #4894

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #4894: review comment fixes

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants