Skip to content

Conversation

@nickjalbert
Copy link
Contributor

@nickjalbert nickjalbert commented Mar 15, 2022

This PR makes it so that when we're building and running a Component DAG with one or more requirements_path's defined, we only activate the virtual env built out of those requirements once and never deactivate.

I think this is the most reasonable behavior given that:

  • It is impossible to isolate dependencies for different components in CPython (see write up here)
  • Some libraries make changes to the environment when they are imported that they expect to persist (e.g. manipulation of the sys.path)
  • We support the get_object() flow and that object expects to run in its environment.

I added a local check to Component.add_dependency() to make sure you weren't doing obviously bad things (adding a Component with requirements to a DAG after a venv was created and activated), but I didn't add any global checks. Specifically, you could create two Components with different venvs and activate them sequentially and potentially get unexpected behavior. This is a failure mode we might want to address in the future.

Fixes #324

@nickjalbert nickjalbert requested a review from andyk March 15, 2022 16:45
@nickjalbert nickjalbert marked this pull request as ready for review March 15, 2022 16:45
@nickjalbert nickjalbert changed the title [WIP] Only activate a Component's venv once Only activate a Component's venv once Mar 15, 2022
@andyk
Copy link
Contributor

andyk commented Mar 17, 2022

LGTM. Feel free to merge!

One observation: since this mostly changes how Component uses VirtualEnv, it is still possible for a user to directly create and activate two VirtualEnvs from within the same python process, and those could try to load c modules that conflict (e.g., two different versions of the same c module), right?

In the future, we may want to change the architecture of VirtualEnv to only allow a single instance of VirtualEnv within its Python process, not just those created by a Component.

@andyk
Copy link
Contributor

andyk commented Mar 17, 2022

Also, looks like maybe some conflicts w/ master after I merged #313

@nickjalbert
Copy link
Contributor Author

it is still possible for a user to directly create and activate two VirtualEnvs from within the same python process, and those could try to load c modules that conflict (e.g., two different versions of the same c module), right?

Right. If the user only used pure Python modules (that is, the module and all its dependencies are written in pure Python), then I believe you could clear the sys.modules cache, change the environment, and then import a different version of the module in a different venv. I'm pretty sure I had an example of this working (maybe for numpy?) when we started looking at trying to isolate multiple virtual envs.

But, because C/C++ modules often used shared state, you can't reload them and so you can get conflicts or unexpected behavior if you try to load modules that require different versions of the same C/C++ module.

In the future, we may want to change the architecture of VirtualEnv to only allow a single instance of VirtualEnv within its Python process, not just those created by a Component.

Yeah, I thought about this. I then wondered if there was a use case where you'd want to activate multiple venvs that had non-conflicting requirements. Not sure if there would be a reason to do that, though. So, basically, I'm open to doing this but want to think a little more about the implications.

@nickjalbert
Copy link
Contributor Author

@andyk thanks for the review! Fixed the conflicts; will merge when green.

@nickjalbert nickjalbert merged commit 07bc713 into agentos-project:master Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VirtualEnvs should preserve sys.path changes

2 participants