Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 21 additions & 4 deletions packaging/rpm/microshift.spec
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@
# modifying the Go binaries breaks the DWARF debugging
%global __os_install_post %{_rpmconfigdir}/brp-compress

# TuneD profile directory changed in RHEL 10
%if 0%{?rhel} >= 10
%global tuned_profile_dir %{_prefix}/lib/tuned/profiles
%else
%global tuned_profile_dir %{_prefix}/lib/tuned
%endif
Comment thread
pmtk marked this conversation as resolved.

# SELinux specifics
%global selinuxtype targeted
%define selinux_policyver 3.14.3-67
Expand Down Expand Up @@ -438,9 +445,16 @@ install -p -m644 assets/components/multus/release-multus-{x86_64,aarch64}.json %
install -p -m644 packaging/tuned/microshift-cleanup-kubelet.service %{buildroot}%{_unitdir}/microshift-cleanup-kubelet.service

# low-latency
install -d -m755 %{buildroot}/%{_prefix}/lib/tuned/microshift-baseline
install -p -m644 packaging/tuned/profile/tuned.conf %{buildroot}/%{_prefix}/lib/tuned/microshift-baseline/tuned.conf
install -p -m755 packaging/tuned/profile/script.sh %{buildroot}/%{_prefix}/lib/tuned/microshift-baseline/script.sh
install -d -m755 %{buildroot}/%{tuned_profile_dir}/microshift-baseline
install -p -m644 packaging/tuned/profile/tuned.conf %{buildroot}/%{tuned_profile_dir}/microshift-baseline/tuned.conf
install -p -m755 packaging/tuned/profile/script.sh %{buildroot}/%{tuned_profile_dir}/microshift-baseline/script.sh
%if 0%{?rhel} < 10
# TuneD on RHEL 10 looks for profiles in /usr/lib/tuned/profiles/
# Until we have RHEL10 builders, the profile needs to be symlinked
# so the package is usable on both RHEL 9 and RHEL 10.
install -d -m755 %{buildroot}/%{_prefix}/lib/tuned/profiles
ln -s ../microshift-baseline %{buildroot}/%{_prefix}/lib/tuned/profiles/microshift-baseline
%endif
install -d -m755 %{buildroot}%{_sysconfdir}/tuned
install -p -m644 packaging/tuned/profile/variables.conf %{buildroot}%{_sysconfdir}/tuned/microshift-baseline-variables.conf

Expand Down Expand Up @@ -744,7 +758,10 @@ fi
%{_datadir}/microshift/release/release-multus-{x86_64,aarch64}.json

%files low-latency
%{_prefix}/lib/tuned/microshift-baseline
%{tuned_profile_dir}/microshift-baseline
%if 0%{?rhel} < 10
%{_prefix}/lib/tuned/profiles/microshift-baseline
%endif
%config(noreplace) %{_sysconfdir}/tuned/microshift-baseline-variables.conf
%{_sysconfdir}/crio/crio.conf.d/05-high-performance-runtime.conf
%{_prefix}/lib/microshift/manifests.d/002-microshift-low-latency/
Expand Down
7 changes: 6 additions & 1 deletion packaging/tuned/microshift-tuned.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,12 @@ def run_command(cmd: list[str], failure_fatal=False) -> tuple[str, bool]:


def get_profile_path(profile: str) -> str:
paths = [f"/etc/tuned/{profile}", f"/usr/lib/tuned/{profile}"]
paths = [
f"/etc/tuned/{profile}",
f"/etc/tuned/profiles/{profile}",
f"/usr/lib/tuned/{profile}",
f"/usr/lib/tuned/profiles/{profile}"
]
for path in paths:
if os.path.exists(path):
logging.debug(f"Found profile '{profile}' in '{path}'")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# hadolint global ignore=DL3059
FROM localhost/rhel102-bootc-source-optionals:latest

# Enable microshift-tuned service
RUN systemctl enable microshift-tuned

# Configure MicroShift for low-latency workloads
RUN cat > /etc/microshift/config.yaml <<'EOF'
kubelet:
cpuManagerPolicy: static
cpuManagerPolicyOptions:
full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
memoryManagerPolicy: Static
topologyManagerPolicy: single-numa-node
reservedSystemCPUs: 0-1
reservedMemory:
- limits:
memory: 1100Mi
numaNode: 0
kubeReserved:
memory: 500Mi
systemReserved:
memory: 500Mi
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 0s
EOF

# Configure tuned baseline variables
RUN cat > /etc/tuned/microshift-baseline-variables.conf <<'EOF'
# Isolated cores should be complementary to kubelet's reserved CPUs.
# Isolated and reserved CPUs should contain all online CPUs.
# Core #3 is for testing offlining hence skipped.
isolated_cores=2,4-5
hugepages_size=2M
hugepages=10
additional_args=test1=on test2=true dummy
offline_cpu_set=3
EOF

# Configure MicroShift tuned profile
RUN cat > /etc/microshift/tuned.yaml <<'EOF'
profile: microshift-baseline
reboot_after_apply: True
EOF
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# hadolint global ignore=DL3059
FROM localhost/rhel98-bootc-source-optionals:latest

# Enable microshift-tuned service
RUN systemctl enable microshift-tuned

# Configure MicroShift for low-latency workloads
RUN cat > /etc/microshift/config.yaml <<'EOF'
kubelet:
cpuManagerPolicy: static
cpuManagerPolicyOptions:
full-pcpus-only: "true"
cpuManagerReconcilePeriod: 5s
memoryManagerPolicy: Static
topologyManagerPolicy: single-numa-node
reservedSystemCPUs: 0-1
reservedMemory:
- limits:
memory: 1100Mi
numaNode: 0
kubeReserved:
memory: 500Mi
systemReserved:
memory: 500Mi
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 0s
EOF

# Configure tuned baseline variables
RUN cat > /etc/tuned/microshift-baseline-variables.conf <<'EOF'
# Isolated cores should be complementary to kubelet's reserved CPUs.
# Isolated and reserved CPUs should contain all online CPUs.
# Core #3 is for testing offlining hence skipped.
isolated_cores=2,4-5
hugepages_size=2M
hugepages=10
additional_args=test1=on test2=true dummy
offline_cpu_set=3
EOF

# Configure MicroShift tuned profile
RUN cat > /etc/microshift/tuned.yaml <<'EOF'
profile: microshift-baseline
reboot_after_apply: True
EOF
50 changes: 50 additions & 0 deletions test/scenarios-bootc/el10/periodics/el102-src@low-latency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

export SKIP_GREENBOOT=true
export TEST_RANDOMIZATION=none

# Sourced from scenario.sh and uses functions defined there.

start_image="rhel102-bootc-source-tuned"

scenario_create_vms() {
exit_if_image_not_found "${start_image}"

prepare_kickstart host1 kickstart-bootc.ks.template "${start_image}" true
launch_vm rhel102-bootc --vm_vcpus 6
}

scenario_remove_vms() {
exit_if_image_not_found "${start_image}"

remove_vm host1
}

scenario_run_tests() {
exit_if_image_not_found "${start_image}"

# Should not be run immediately after creating VM because of
# microshift-tuned rebooting the node to activate the profile.
local -r start_time=$(date +%s)
while true; do
boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" || true)
boot_num="${boot_num%$'\r'*}"
if [[ "${boot_num}" -ge 2 ]]; then
break
fi
if [ $(( $(date +%s) - start_time )) -gt 60 ]; then
echo "Timed out waiting for VM having 2 boots"
exit 1
fi
sleep 5
Comment on lines +28 to +39
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the specific file and check lines around 28-39
cat -n "test/scenarios-bootc/el10/periodics/el102-src@low-latency.sh" | sed -n '20,45p'

Repository: openshift/microshift

Length of output: 1062


🏁 Script executed:

# Search for similar boot-polling patterns to understand the "hardening" mentioned
rg "boot_num|list-boots" --type sh -B 3 -A 3

Repository: openshift/microshift

Length of output: 10872


🏁 Script executed:

# Search for TUNED_REBOOT_WAIT_TIMEOUT to see if it's used elsewhere
rg "TUNED_REBOOT_WAIT_TIMEOUT" --type sh

Repository: openshift/microshift

Length of output: 46


🏁 Script executed:

# Search for boot_num with regex validation (the hardening pattern)
rg 'boot_num.*=~|boot_num.*\[0-9\]' --type sh -B 2 -A 2

Repository: openshift/microshift

Length of output: 46


🏁 Script executed:

# Search for longer timeout values in similar context
rg 'boot_wait_timeout|TUNED_REBOOT_WAIT_TIMEOUT|boot.*timeout.*18[0-9]|boot.*timeout.*2[0-9]{2}' --type sh -B 3 -A 3

Repository: openshift/microshift

Length of output: 46


🏁 Script executed:

# Look for helper functions or common boot polling patterns
rg 'wait.*boot|boot.*poll' --type sh -i

Repository: openshift/microshift

Length of output: 3821


🏁 Script executed:

# Check if there are any helper functions related to boot polling
rg 'run_command_on_vm' --type sh -A 5 | head -50

Repository: openshift/microshift

Length of output: 4035


🏁 Script executed:

# Look at the test/bin/scenario.sh to see if there's a better pattern
cat test/bin/scenario.sh | head -100

Repository: openshift/microshift

Length of output: 3645


🏁 Script executed:

# Check if there are any recent commits or patterns with numeric validation
rg '\[0-9\]\+|\^\[0-9\]' --type sh -B 2 -A 2 | head -40

Repository: openshift/microshift

Length of output: 3239


Harden boot-polling to prevent CI flakes.

The current code lacks numeric validation for boot_num (can fail with non-numeric output during reboots) and uses a 60-second timeout that's too tight for periodic environments. Add regex validation (^[0-9]+$), increase timeout to 180 seconds, and declare boot_num as local.

Proposed fix
 scenario_run_tests() {
     exit_if_image_not_found "${start_image}"

     # Should not be run immediately after creating VM because of
     # microshift-tuned rebooting the node to activate the profile.
     local -r start_time=$(date +%s)
+    local -r boot_wait_timeout="${TUNED_REBOOT_WAIT_TIMEOUT:-180}"
     while true; do
-        boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" || true)
+        local boot_num
+        boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" 2>/dev/null || true)
         boot_num="${boot_num%$'\r'*}"
-        if [[ "${boot_num}" -ge 2 ]]; then
+        if [[ "${boot_num}" =~ ^[0-9]+$ ]] && [[ "${boot_num}" -ge 2 ]]; then
             break
         fi
-        if [ $(( $(date +%s) - start_time )) -gt 60 ]; then
+        if [ $(( $(date +%s) - start_time )) -gt "${boot_wait_timeout}" ]; then
             echo "Timed out waiting for VM having 2 boots"
             exit 1
         fi
         sleep 5
     done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local -r start_time=$(date +%s)
while true; do
boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" || true)
boot_num="${boot_num%$'\r'*}"
if [[ "${boot_num}" -ge 2 ]]; then
break
fi
if [ $(( $(date +%s) - start_time )) -gt 60 ]; then
echo "Timed out waiting for VM having 2 boots"
exit 1
fi
sleep 5
local -r start_time=$(date +%s)
local -r boot_wait_timeout="${TUNED_REBOOT_WAIT_TIMEOUT:-180}"
while true; do
local boot_num
boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" 2>/dev/null || true)
boot_num="${boot_num%$'\r'*}"
if [[ "${boot_num}" =~ ^[0-9]+$ ]] && [[ "${boot_num}" -ge 2 ]]; then
break
fi
if [ $(( $(date +%s) - start_time )) -gt "${boot_wait_timeout}" ]; then
echo "Timed out waiting for VM having 2 boots"
exit 1
fi
sleep 5
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scenarios-bootc/el10/periodics/el102-src`@low-latency.sh around lines 28
- 39, The loop that polls boots lacks numeric validation and uses too-short
timeout; make boot_num a local variable (local boot_num), validate its value
with a regex like [[ "${boot_num}" =~ ^[0-9]+$ ]] before numeric comparison,
only treat it as >=2 when it passes validation, and extend the timeout check
from 60 to 180 seconds (use start_time and date +%s as in the existing logic
with the updated timeout value). Use run_command_on_vm, boot_num and start_time
to locate the code to change.

done

# --exitonfailure because tests within suites are meant to be ordered,
# so don't advance to next test if current failed.

run_tests host1 \
--exitonfailure \
suites/tuned/microshift-tuned.robot \
suites/tuned/workload-partitioning.robot \
suites/tuned/uncore-cache.robot
}
50 changes: 50 additions & 0 deletions test/scenarios-bootc/el9/periodics/el98-src@low-latency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

export SKIP_GREENBOOT=true
export TEST_RANDOMIZATION=none

# Sourced from scenario.sh and uses functions defined there.

start_image="rhel98-bootc-source-tuned"

scenario_create_vms() {
exit_if_image_not_found "${start_image}"

prepare_kickstart host1 kickstart-bootc.ks.template "${start_image}" true
launch_vm rhel98-bootc --vm_vcpus 6
}

scenario_remove_vms() {
exit_if_image_not_found "${start_image}"

remove_vm host1
}

scenario_run_tests() {
exit_if_image_not_found "${start_image}"

# Should not be run immediately after creating VM because of
# microshift-tuned rebooting the node to activate the profile.
local -r start_time=$(date +%s)
while true; do
boot_num=$(run_command_on_vm host1 "sudo journalctl --list-boots --quiet | wc -l" || true)
boot_num="${boot_num%$'\r'*}"
if [[ "${boot_num}" -ge 2 ]]; then
break
fi
if [ $(( $(date +%s) - start_time )) -gt 60 ]; then
echo "Timed out waiting for VM having 2 boots"
exit 1
fi
sleep 5
Comment on lines +28 to +39
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify the current code pattern in both low-latency scenario files.
for f in \
  test/scenarios-bootc/el9/periodics/el98-src@low-latency.sh \
  test/scenarios-bootc/el10/periodics/el102-src@low-latency.sh
do
  echo "== ${f} =="
  rg -n 'boot_num=.*journalctl|"\$\{boot_num\}" -ge 2| -gt 60' "${f}"
done

# Reproduce bash integer-compare behavior with empty value.
bash -lc 'boot_num=""; if [[ "${boot_num}" -ge 2 ]]; then echo "ok"; fi' || true

Repository: openshift/microshift

Length of output: 735


Harden boot-count polling to prevent flaky failures.

The [[ "${boot_num}" -ge 2 ]] comparison fails when boot_num is empty or non-numeric during reboots. The hardcoded 60s timeout is also brittle for CI environments. Add regex validation before numeric comparison and use a configurable timeout via TUNED_REBOOT_WAIT_TIMEOUT environment variable (default 180s).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/scenarios-bootc/el9/periodics/el98-src`@low-latency.sh around lines 28 -
39, The boot-count polling loop using boot_num from run_command_on_vm can fail
if boot_num is empty or non-numeric; change the loop in which start_time and
boot_num are used to first validate boot_num with a regex (e.g. ^[0-9]+$) before
doing the numeric comparison, treating non-matches as not-yet-ready and
continuing the loop; replace the hardcoded 60s timeout with a configurable
TUNED_REBOOT_WAIT_TIMEOUT (default to 180 if unset) and use that variable in the
elapsed-time check; update references to boot_num, start_time and the loop that
contains run_command_on_vm accordingly.

done

# --exitonfailure because tests within suites are meant to be ordered,
# so don't advance to next test if current failed.

run_tests host1 \
--exitonfailure \
suites/tuned/microshift-tuned.robot \
suites/tuned/workload-partitioning.robot \
suites/tuned/uncore-cache.robot
}
2 changes: 2 additions & 0 deletions test/suites/tuned/workload-partitioning.robot
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Teardown For Workload Partitioning
Remove Drop In MicroShift Config 10-kubelet
Systemctl restart crio.service
Restart MicroShift
Wait For MicroShift Healthcheck Success

Configure Kubelet For Workload Partitioning
[Documentation] configure microshift with kubelet CPU configuration
Expand Down Expand Up @@ -252,6 +253,7 @@ Cleanup And Create NS
Cleanup MicroShift --all --keep-images
Remove Files ${KUBELET_CPU_STATE_FILE}
Restart MicroShift
Wait For MicroShift Healthcheck Success
${ns}= Create Unique Namespace
VAR ${NAMESPACE}= ${ns} scope=SUITE

Expand Down