Skip to content

Conversation

@shwstppr
Copy link
Contributor

@shwstppr shwstppr commented Jul 21, 2025

Description

Increases timeout for agent/host arch retrieval

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Without change (getting errors like the following):

2025-07-21 11:51:01,233 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Successfully executed process [1282816] for command [/usr/bin/arch ].
2025-07-21 11:51:01,234 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Executing command [/usr/bin/arch ].
2025-07-21 11:51:01,236 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Successfully executed process [1282817] for command [/usr/bin/arch ].
2025-07-21 11:51:03,113 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:1a7aaa2d) Request:Seq 2-6273232805950390281:  { Cmd , MgmtId: 32988452618506, via: 2, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.ReadyCommand":{"dcId":"1","hostId":"2","hostUuid":"cd51c3cc-ea1c-4396-9fcf-37beec4bd032","hostName":"pr9752-t13627-kvm-ol8-kvm2","enableHumanReadableSizes":"true","arch":"x86_64","wait":"0","bypassHostMaintenance":"false"}}] }
2025-07-21 11:51:03,117 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:1a7aaa2d) Executing command [/usr/bin/arch ].
2025-07-21 11:51:03,120 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:1a7aaa2d) Successfully executed process [1282865] for command [/usr/bin/arch ].
2025-07-21 11:51:03,121 ERROR [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:1a7aaa2d) Unexpected arch null, expected x86_64

With change (no errors):

2025-07-21 12:02:53,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Arch for agent: pr9752-t13627-kvm-ol8-kvm2 found: x86_64
2025-07-21 12:02:53,353 DEBUG [utils.script.Script] (Agent-Handler-1:[]) (logid:) Executing command [/bin/bash -c /usr/bin/arch ].
2025-07-21 12:02:53,361 DEBUG [utils.script.Script] (Agent-Handler-1:[]) (logid:) Successfully executed process [1284377] for command [/bin/bash -c /usr/bin/arch ].
2025-07-21 12:02:53,361 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Arch for agent: pr9752-t13627-kvm-ol8-kvm2 found: x86_64
2025-07-21 12:02:53,400 DEBUG [cloud.agent.Agent] (Agent-Handler-1:[]) (logid:) Sending Startup: Seq 2-0:  { Cmd , MgmtId: -1, via: 2, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"3","cpus":"3","speed":"2100","cpuArch":"x86_64","memory":"7259144192","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.4.17-2136.309.5.1.el8uek.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"8.6","host.volume.encryption":"true","host.instance.conversion":"false","secured":"true","Host.OS":"Red Hat Enterprise Linux"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"40bf71dd-b5c2-344b-8258-09361013f3a4-LibvirtComputingResource","name":"pr9752-t13627-kvm-ol8-kvm2","id":"2","version":"4.21.0.0-SNAPSHOT","iqn":"iqn.1988-12.com.oracle:67eb595b8924","publicIpAddress":"192.168.255.254","publicNetmask":"255.255.255.252","publicMacAddress":"02:00:58:0c:ae:71","privateIpAddress":"10.0.33.41","privateMacAddress":"1e:00:38:00:01:bc","privateNetmask":"255.255.240.0","storageIpAddress":"10.0.33.41","storageNetmask":"255.255.240.0","storageMacAddress":"1e:00:38:00:01:bc","resourceName":"LibvirtComputingResource","gatewayIpAddress":"10.0.32.1","msHostList":"10.0.32.119@static","connectionTransferred":"false","arch":"x86_64","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"4a2cdb94-e70f-4cbc-936a-03e5a36f2884","host":"10.0.33.41","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":"(18.99 GB) 20386414592","availableBytes":"(14.11 GB) 15153868800"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"40bf71dd-b5c2-344b-8258-09361013f3a4-LibvirtComputingResource","name":"pr9752-t13627-kvm-ol8-kvm2","id":"2","version":"4.21.0.0-SNAPSHOT","resourceName":"LibvirtComputingResource","msHostList":"10.0.32.119@static","connectionTransferred":"false","arch":"x86_64","wait":"0","bypassHostMaintenance":"false"}}] }
2025-07-21 12:02:55,249 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:f370cdcb) Request:Seq 2-3945434748553265161:  { Cmd , MgmtId: 32988452618506, via: 2, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.ReadyCommand":{"dcId":"1","hostId":"2","hostUuid":"cd51c3cc-ea1c-4396-9fcf-37beec4bd032","hostName":"pr9752-t13627-kvm-ol8-kvm2","enableHumanReadableSizes":"true","arch":"x86_64","wait":"0","bypassHostMaintenance":"false"}}] }
2025-07-21 12:02:55,253 DEBUG [utils.script.Script] (AgentRequest-Handler-2:[]) (logid:f370cdcb) Executing command [/bin/bash -c /usr/bin/arch ].
2025-07-21 12:02:55,262 DEBUG [utils.script.Script] (AgentRequest-Handler-2:[]) (logid:f370cdcb) Successfully executed process [1284425] for command [/bin/bash -c /usr/bin/arch ].
2025-07-21 12:02:55,262 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:f370cdcb) Arch for agent: pr9752-t13627-kvm-ol8-kvm2 found: x86_64

How did you try to break this feature and the system with this change?

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
@codecov
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.57%. Comparing base (1cbf1cd) to head (d62032f).
⚠️ Report is 288 commits behind head on main.

Files with missing lines Patch % Lines
agent/src/main/java/com/cloud/agent/Agent.java 0.00% 3 Missing ⚠️
...org/apache/cloudstack/utils/linux/KVMHostInfo.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11254      +/-   ##
============================================
+ Coverage     16.15%   16.57%   +0.42%     
- Complexity    13268    14059     +791     
============================================
  Files          5657     5772     +115     
  Lines        497772   512939   +15167     
  Branches      60364    62305    +1941     
============================================
+ Hits          80406    85013    +4607     
- Misses       408415   418450   +10035     
- Partials       8951     9476     +525     
Flag Coverage Δ
uitests 3.89% <ø> (-0.11%) ⬇️
unittests 17.47% <20.00%> (+0.46%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 14319

@shwstppr shwstppr marked this pull request as ready for review July 25, 2025 13:58
@sureshanaparti sureshanaparti requested a review from Copilot July 26, 2025 07:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases the timeout for agent/host architecture retrieval from 500ms to 1000ms and standardizes the approach used across different components. The changes address timeout errors that were occurring when retrieving CPU architecture information.

  • Increased timeout for architecture retrieval in Agent.java from 500ms to 1000ms
  • Standardized arch command execution using Script.getExecutableAbsolutePath("arch") instead of hardcoded paths
  • Added debug logging for better troubleshooting

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
agent/src/main/java/com/cloud/agent/Agent.java Updated getAgentArch() method to use standardized script execution with increased timeout and added debug logging
plugins/hypervisors/kvm/src/main/java/org/apache/cloudstack/utils/linux/KVMHostInfo.java Renamed variable and updated to use Script.getExecutableAbsolutePath() for consistency
Comments suppressed due to low confidence (1)

plugins/hypervisors/kvm/src/main/java/org/apache/cloudstack/utils/linux/KVMHostInfo.java:61

  • [nitpick] The variable name 'cpuArchRetrieveExecutable' is inconsistent with the naming pattern used elsewhere. Consider renaming to 'cpuArchCommand' or 'archExecutable' for better consistency with the codebase.
    private static String cpuArchRetrieveExecutable = "arch";

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

private String getCPUArchFromCommand() {
LOGGER.info("Fetching host CPU arch");
return Script.runSimpleBashScript(cpuArchCommand);
return Script.runSimpleBashScript(Script.getExecutableAbsolutePath(cpuArchRetrieveExecutable));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Script.runSimpleBashScript(Script.getExecutableAbsolutePath(cpuArchRetrieveExecutable));
return Script.runSimpleBashScript(Script.getExecutableAbsolutePath(cpuArchRetrieveExecutable), 1000);

increase timeout required here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshanaparti as this was already running without a timeout and I didn't face an issue here, I have not changed this

@blueorangutan
Copy link

[SF] Trillian test result (tid-13906)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 56168 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11254-t13906-kvm-ol8.zip
Smoke tests completed. 142 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, manually checked it.

@sureshanaparti sureshanaparti moved this to In Progress in Apache CloudStack 4.21.0 Aug 1, 2025
@sureshanaparti sureshanaparti merged commit 44f8064 into apache:main Aug 1, 2025
25 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.21.0 Aug 1, 2025
@sureshanaparti sureshanaparti deleted the agent-arch-timeout branch August 1, 2025 12:12
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Aug 6, 2025
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
yadvr pushed a commit to shapeblue/cloudstack that referenced this pull request Oct 10, 2025
Cherry-picked from 44f8064

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
weizhouapache pushed a commit that referenced this pull request Oct 14, 2025
Cherry-picked from 44f8064

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Oct 17, 2025
…e#11822)

Cherry-picked from 44f8064

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

6 participants