Conversation
mehalyna
left a comment
There was a problem hiding this comment.
Architectural Suggestions
- A
manifest.jsonalongside thetar.gz(with version, hash, files) can give you:
- upgrade metadata
- better flexibility for future formats
- checkpoint resuming (future)
- Agent should:
- load current version from a
version.txt - expose it in
/status - fail gracefully if missing or unreadable
-
Prevent concurrent upgrades or overlapping retries (e.g., via a
.upgradinglockfile or atomic flags). -
Allow agent CLI fallback:
python agent.py --upgrade 1.3.0 --url=https://...
There was a problem hiding this comment.
The base.py agent class currently combines:
- Core agent lifecycle logic
- Status reporting
- Communication
- And now: a full upgrade system (download, validate, extract, rollback…)
This violates SRP because the agent class is now responsible for:
- Its main business logic and
- System upgrade orchestration (a completely separate concern)
Create a new file: upgrade_manager.py
# agents_infra/upgrade_manager.py
class AgentUpgradeManager:
def __init__(self, logger, agent_dir):
self.logger = logger
self.agent_dir = agent_dir
def upgrade(self, target_version: str, url: str, sha256: str):
# move all upgrade logic here
# optionally return a result object (success, message, version)
Usage in base.py:
from agents_infra.upgrade_manager import AgentUpgradeManager
...
def upgrade(self, target_version, url, sha256):
manager = AgentUpgradeManager(self.logger, os.getcwd())
result = manager.upgrade(target_version, url, sha256)
if result.success:
self.version = target_version
Optional: Later we may add Strategy Pattern, if expecting multiple upgrade approaches (tarball, git pull, .deb)
class UpgradeStrategy(ABC):
@abstractmethod
def upgrade(self): ...
class TarballUpgrade(UpgradeStrategy): ...
class GitPullUpgrade(UpgradeStrategy): ...
| self.logger.error('Upgrade failed: {}. ' | ||
| 'Restoring from backup.'.format(e)) |
There was a problem hiding this comment.
Use logger.exception() here to include traceback automatically.
| with open(package_path, 'rb') as f: | ||
| while True: | ||
| chunk = f.read(4096) | ||
| if not chunk: | ||
| break | ||
| sha256_actual.update(chunk) |
There was a problem hiding this comment.
Cleaner alternative:
for chunk in iter(lambda: f.read(4096), b''):
sha256_actual.update(chunk)
| 'completed successfully'.format(target_version)) | ||
|
|
||
| # Updating the version | ||
| self.version = target_version |
There was a problem hiding this comment.
This won't persist across agent restarts. Consider storing this version in a persistent file (e.g., version.txt) and loading it on agent startup.
| tar.extractall(path=tmp_dir) | ||
|
|
||
| # Assuming the extracted dir has same name as current agent dir | ||
| extracted_dir = os.path.join(tmp_dir, 'agent') |
There was a problem hiding this comment.
Not all packages may extract into an agent/ subfolder. Either:
- Validate it exists before assuming
- Extract and dynamically find the top-level directory
- Use a
manifest.jsonor similar
pydata_center/monitoring/helpers.py
Outdated
| logger.warning(f'Agent {hostname} did not report version.') | ||
| return | ||
|
|
||
| if agent_version < latest_version: |
There was a problem hiding this comment.
String comparison can fail, e.g.:
"1.12" < "1.3" # evaluates True, which is wrong
Suggestion:
from packaging import version
if version.parse(agent_version) < version.parse(latest_version):
There was a problem hiding this comment.
Future-proofing:
Since upgrade URLs are remote:
- Add signature verification (GPG, HMAC, etc.)
- Or ensure HTTPS + signed manifest
| AgentUpgradeHistory.objects.filter( | ||
| hostname=command.hostname, | ||
| to_version=command.params.get('target'), | ||
| status='pending' | ||
| ).update( | ||
| status='success' if final_status == 'done' else 'failed', | ||
| finished_at=now(), | ||
| message=final_result | ||
| ) | ||
| logger.info( | ||
| f'Updated AgentUpgradeHistory for {command.hostname} ' | ||
| f'to {final_status}' | ||
| ) |
There was a problem hiding this comment.
- Add audit logs for upgrade initiation in CommandHistory
- Consider marking
CommandHistory.status = done/failedas well
| AGENT_PACKAGE_URL = 'https://storage.example.com/agent-latest.tar.gz' | ||
| AGENT_PACKAGE_SHA256 = '123abc456def789...' |
There was a problem hiding this comment.
These should be environment-specific. Consider .env or config management tool (Vault, AWS SSM, etc.)
|
@CatSonbenim коли буде час переглянь будь ласка мій PR. Буду дуже вдячний! |
CatSonbenim
left a comment
There was a problem hiding this comment.
Overall great job on large feature)
| self.message = message | ||
|
|
||
|
|
||
| class AgentUpgradeManager: |
There was a problem hiding this comment.
It would be better to make this class abstract with
def upgrade(self, *args, **kwargs): raise NotImplementedError
and incapsulate upgrade from tar logic in separate child class. Otherwise we will break OCP once we want to add other source of download.
| return command_history | ||
|
|
||
| def upgrade(self, target_version, url, sha256): | ||
| manager = AgentUpgradeManager(self.logger, os.getcwd()) |
There was a problem hiding this comment.
please pass the manager as the argument, that should be the class with .upgrade interface.
|
|
||
| return command_history | ||
|
|
||
| def upgrade(self, target_version, url, sha256): |
There was a problem hiding this comment.
add *args, **kwargs here, and pass them to .upgrade method (we don't know what could be needed for the upgrade).
| self.logger = logger | ||
| self.agent_dir = agent_dir | ||
|
|
||
| def upgrade(self, target_version, url, sha256): |
There was a problem hiding this comment.
lets make this method suitable to pass arguments in *args, **kwargs format.
|
|
||
| class UpgradeResult: | ||
| def __init__(self, success, message): | ||
| self.success = success |
There was a problem hiding this comment.
Let's make this not boolean value, but status codes. For example i saw the case where upgrade is skipped. In that case we can not say this was unseccessfull or seccessfull - it was skipped. In future that can help us filter what to alert user about (not sure if we have allerts feature, but it can be introduced further). In such case we can say that user should get alerts on failed upgrades (or even add more specific instructions with 2/3 digit status codes.
| if os.path.exists(lockfile_path): | ||
| self.logger.warning('Upgrade already in progress. ' | ||
| 'Lockfile exists.') | ||
| return UpgradeResult(False, 'Upgrade skipped: ' |
There was a problem hiding this comment.
as a follow-up to the comment above - pass here a skipped status code
| shutil.copytree(self.agent_dir, backup_dir) | ||
| self.logger.info('Backup created at {}'.format(backup_dir)) |
There was a problem hiding this comment.
let's make a separate file with upgrade utils and split the code of this method into separate defs. We are saying that in the future we can have more UpgradeManagers and all of them will need backup creation etc
There was a problem hiding this comment.
Also that will be good for incupsulation - now it's quite hard to read the code and follow the flow. Also it will allow to do more specific, maybe custom error handeling so we can distinguish causes of failed upgrade and inform the user about it
There was a problem hiding this comment.
Do separate functions for the blocks you have under comments
| final_result = serializer.validated_data.get('result', '') | ||
|
|
||
| alert_if_command_failed(command.hostname, final_result) | ||
| if final_status in ['done', 'failed']: |
There was a problem hiding this comment.
I suppose status code in UpgradeResult can help you here
This PR is not ready yet and was created to see if I am on the right track with this task.
I would be very grateful for your review and, if possible, I would like to get some advice on what should be added and what can be changed
On the agent side
A new upgrade method has been added to the agent base class (base.py), which:
On the controller side
Added the maybe_dispatch_upgrade function to helpers.py:
A new model has been added to models.py:
AgentUpgradeHistory - stores the history of agent upgrades (from version to version, status, messages, start/end times).
**Added to views.py: **
reading the transferred version of the agent and calling maybe_dispatch_upgrade immediately after saving the status.
Added to the command result handler (submit_command_result):
update the AgentUpgradeHistory record when the upgrade command is completed (done or failed).