Skip to content

Hotfix/#113#115

Merged
p14c31355 merged 91 commits intodevelopfrom
hotfix/#113
Sep 3, 2025
Merged

Hotfix/#113#115
p14c31355 merged 91 commits intodevelopfrom
hotfix/#113

Conversation

@p14c31355
Copy link
Owner

🚀 Pull Request

Overview

  • Related Issue: Feature/#113 #114
  • Response details: It worked, but the characters were garbled, so I fixed it.

Change details

  • New feature
  • Refactoring
  • Bug fix
  • CI / Build settings correction
  • Documentation update

Build / Test Results

$ cargo check     #
$ cargo test      #

Target board with confirmed operation

  • ATmega328p
  • ESP32
  • STM32
  • Linux mock
  • Other: ___

@p14c31355 p14c31355 self-assigned this Sep 1, 2025
@p14c31355 p14c31355 added the bug Something isn't working label Sep 1, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @p14c31355, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily addresses a bug causing garbled characters by adjusting a buffer size. Additionally, it includes a significant refactoring of the command node and explorer structures, transitioning to const generics for improved memory efficiency and type safety in handling command data and their dependencies. These changes enhance the robustness and clarity of the system's command processing capabilities.

Highlights

  • Garbled Character Fix: The buffer size in the prevent_garbled function within src/compat/util.rs was reduced from 512 to 256 bytes, directly addressing the reported issue of garbled characters.
  • Refactoring of CmdNode Structure: The CmdNode struct in src/explore/explorer.rs was refactored to use fixed-size arrays ([u8; N], [u8; D]) for bytes and deps, along with new bytes_len and deps_len fields. This change improves memory management and type safety by allowing command data and dependencies to be stored in compile-time sized buffers.
  • Enhanced Explorer Generics: The Explorer struct and its associated functions and macros across src/explore/explorer.rs and src/explore/runner.rs were updated to leverage new const generics parameters (NODE_COUNT, BYTES_MAX, DEPS_MAX). This enables more precise type definitions and compile-time size guarantees for the command nodes and their data.
  • New Utility Function for Node Dependencies: A new helper function write_node_deps was added to src/compat/util.rs to facilitate formatted printing of node dependencies, which is now utilized within the Explorer's get_one_sort method for debugging or logging purposes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with garbled characters and also introduces a significant refactoring of the Explorer and related components. The reduction of the buffer size in prevent_garbled is a good step to prevent potential stack overflows on memory-constrained devices like the ATmega328p. The move to const generics and stack-based arrays from &'static slices is a great improvement for an embedded context.

However, the refactoring has introduced some critical issues where a generic parameter for maximum byte length (N) has been used instead of the parameter for the number of nodes (NODE_COUNT) when sizing arrays. This will likely cause panics due to out-of-bounds memory access. I've provided detailed comments and suggestions to fix these critical bugs.

I've also included a couple of medium-severity comments regarding inconsistent error handling in a new utility function and a misleading code comment.

Comment on lines 152 to 170
pub fn write_node_deps<W: core::fmt::Write>(
w: &mut W,
index: usize,
deps: &[u8],
) -> core::fmt::Result
where
W: core::fmt::Write,
{
write!(w, "node ").ok();
if index < 256 {
write!(w, "{:02X}", index as u8)?;
} else {
write!(w, "?"); // overflow fallback
}
write!(w, ": deps=").ok();
write_bytes_hex_fmt(w, deps)?;
writeln!(w).ok(); // Add a newline since write_bytes_hex_fmt doesn't include one
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new function has inconsistent error handling and a redundant where clause.

  1. Inconsistent Error Handling: Some write! and writeln! calls use .ok() to ignore errors, while others use ? to propagate them. Since the function returns a core::fmt::Result, it's best practice to propagate all errors using ? and let the caller decide whether to handle or ignore them. This makes the function more robust and predictable.
  2. Redundant where Clause: The where W: core::fmt::Write on line 158 is redundant, as this bound is already specified on line 152.

I've provided a suggestion that addresses both points for cleaner and more idiomatic Rust code.

pub fn write_node_deps<W: core::fmt::Write>(
    w: &mut W,
    index: usize,
    deps: &[u8],
) -> core::fmt::Result {
    write!(w, "node ")?;
    if index < 256 {
        write!(w, "{:02X}", index as u8)?;
    } else {
        write!(w, "?")?; // overflow fallback
    }
    write!(w, ": deps=")?;
    write_bytes_hex_fmt(w, deps)?;
    writeln!(w)?; // Add a newline since write_bytes_hex_fmt doesn't include one
    Ok(())
}

p14c31355 and others added 5 commits September 2, 2025 01:10
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p14c31355
Copy link
Owner Author

@gemini-code-assist
Even with the current code, I'm exceeding the SRAM limit in an Arduino Uno environment.
Please let me know if there is a way to optimize it.

@gemini-code-assist
Copy link
Contributor

@p14c31355 Thanks for reaching out! I understand you're encountering SRAM limit issues on your Arduino Uno. This is a common challenge in embedded development, especially with limited resources like the ATmega328P's 2KB SRAM.

The recent changes in this Pull Request, specifically how CmdNode is defined and used, are very likely the cause of the increased SRAM consumption. Let me explain:

Previously, CmdNode used &'static [u8] for bytes and deps. This is a highly memory-efficient approach for microcontrollers because the actual byte data (&[u8]) is stored in flash memory (PROGMEM on Arduino), and only a small pointer and length (typically 4 bytes total on AVR) are stored in SRAM for each CmdNode instance. This allows you to have many CmdNodes without consuming much SRAM.

In this PR, CmdNode was changed to contain fixed-size arrays [u8; N] and [u8; D]. While this can be useful for const contexts, it means that the entire array content for bytes and deps is now stored directly within each CmdNode struct. When you create a const NODES array of these CmdNodes, the entire array, including all the bytes and deps data, is allocated in SRAM. If N and D are chosen to be the maximum possible size for any command or dependency, and you have many nodes, this can quickly exhaust your SRAM.

For example, if you have 100 CmdNodes, and N=32 and D=32, each node would consume 32 + 1 (for bytes_len) + 32 + 1 (for deps_len) = 66 bytes. For 100 nodes, that's 66 * 100 = 6600 bytes, which is far beyond the Arduino Uno's 2KB SRAM limit.

To address this, I recommend reverting the CmdNode definition and related code to use &'static [u8] slices. This will move the bulk of the command and dependency data back into flash memory, significantly reducing SRAM usage. The change you made to reduce the heapless::String buffer size in src/compat/util.rs from 512 to 256 bytes was a good step, but the CmdNode change is likely the primary culprit.

Here are the suggested changes to revert the memory-intensive parts:

1. src/explore/explorer.rs

--- a/src/explore/explorer.rs
+++ b/src/explore/explorer.rs
@@ -7,10 +7,11 @@ use crate::error::{ExecutorError, ExplorerError};
 use crate::scanner::{I2C_SCAN_ADDR_END, I2C_SCAN_ADDR_START};
 const I2C_ADDRESS_COUNT: usize = 128;
 
-pub struct CmdNode<const N: usize, const D: usize> {
-    pub bytes: [u8; N],
-    pub bytes_len: u8,
-    pub deps: [u8; D],
-    pub deps_len: u8,
+#[derive(Copy, Clone)]
+pub struct CmdNode {
+    pub bytes: &'static [u8],
+    pub deps: &'static [u8],
 } 
 
 pub trait CmdExecutor<I2C, const CMD_BUFFER_SIZE: usize> {
@@ -214,50 +215,60 @@ where
 macro_rules! nodes {
     (
         prefix = $prefix:expr,
-        bytes_max = $bytes_max:expr,
-        deps_max = $deps_max:expr,
+        // Removed bytes_max and deps_max parameters
         [
             $( [ $( $b:expr ),* ] $( @ [ $( $d:expr ),* ] )? ),* $(,)?
         ]
     ) => {{
-        const NODE_COUNT: usize = $crate::count_exprs!($( [ $( $b ),* ] ),*);
-
-        const NODES: [$crate::explore::explorer::CmdNode<$bytes_max, $deps_max>; NODE_COUNT] = [
+        static NODES: &[$crate::explore::explorer::CmdNode] = &[
             $(
                 $crate::explore::explorer::CmdNode {
-                    bytes: {
-                        let mut arr = [0u8; $bytes_max];
-                        let mut i = 0;
-                        $(
-                            arr[i] = $b;
-                            i += 1;
-                        )*
-                        arr
-                    },
-                    bytes_len: $crate::count_exprs!($($b),*) as u8,
-                    deps: {
-                        let mut arr = [0u8; $deps_max];
-                        let mut i = 0;
-                        $(
-                            $(
-                                arr[i] = $d;
-                                i += 1;
-                            )*
-                        )?
-                        arr
-                    },
-                    deps_len: $crate::count_exprs!($($($d),*)?) as u8,
+                    bytes: &[ $( $b ),* ],
+                    deps: &[ $( $( $d ),* )? ],
                 }
             ),*
         ];
 
-        const EXPLORER: $crate::explore::explorer::Explorer<NODE_COUNT, $bytes_max, $deps_max> = $crate::explore::explorer::Explorer {
-            nodes: &NODES,
+        static EXPLORER: $crate::explore::explorer::Explorer<{NODES.len()}> = $crate::explore::explorer::Explorer {
+            nodes: NODES,
         };
 
         EXPLORER
     }};
 }
 
 /// simple macro to count comma-separated expressions at compile time
 #[macro_export]
 macro_rules! count_exprs {
     () => (0usize);
     ($x:expr $(, $xs:expr)*) => (1usize + $crate::count_exprs!($($xs),*));
 }
 
-pub struct Explorer<'a, const NODE_COUNT: usize, const N: usize, const D: usize> {
-    pub nodes: &'a [CmdNode<N, D>; NODE_COUNT],
+pub struct Explorer<const N: usize> {
+    pub(crate) nodes: &'static [CmdNode],
 }
 
 pub struct ExploreResult {
@@ -266,7 +277,7 @@ pub struct ExploreResult {
     pub permutations_tested: usize,
 }
 
-impl<'a, const NODE_COUNT: usize, const N: usize, const D: usize> Explorer<'a, NODE_COUNT, N, D> {
+impl<const N: usize> Explorer<N> {
     // This function calculates the max length of a single command's byte array
     pub const fn max_cmd_len(&self) -> usize {
         let mut max_len = 0;
@@ -281,7 +292,7 @@ impl<const N: usize> Explorer<N> {
         max_len
     }
 
-    pub const fn new(nodes: &'a [CmdNode<N, D>; NODE_COUNT]) -> Self {
+    pub const fn new(nodes: &'static [CmdNode]) -> Self {
         Self { nodes }
     }
  
@@ -368,20 +379,20 @@ impl<const N: usize> Explorer<N> {
     }
 
     pub fn get_one_sort(
-        &'a self,
+        &self,
         _writer: &mut impl core::fmt::Write,
-        failed_nodes: &[bool; NODE_COUNT], // This is fine, it's a reference to a fixed-size array
-    ) -> Result<(heapless::Vec<&'a [u8], NODE_COUNT>, heapless::Vec<u8, NODE_COUNT>), ExplorerError> {
+        failed_nodes: &[bool; N],
+    ) -> Result<(heapless::Vec<&'static [u8], N>, heapless::Vec<u8, N>), ExplorerError> {
         let len = self.nodes.len();
-        let mut in_degree: [u8; NODE_COUNT] = [0; NODE_COUNT];
-        let mut adj_list_rev: [u128; NODE_COUNT] = [0; NODE_COUNT];
+        let mut in_degree: [u8; N] = [0; N];
+        let mut adj_list_rev: [u128; N] = [0; N];
 
         for (i, node) in self.nodes.iter().enumerate().take(len) {
-            util::write_node_deps(_writer, i, &node.deps[..node.deps_len as usize]).ok();
+            util::write_node_deps(_writer, i, node.deps).ok();
         }
 
         // Ensure NODE_COUNT is large enough for the sequence
-        if len > NODE_COUNT {
+        if len > N {
             return Err(ExplorerError::TooManyCommands);
         }
  
@@ -390,25 +401,25 @@ impl<const N: usize> Explorer<N> {
             if failed_nodes[i] {
                 continue;
             }
-            for &dep_idx in node.deps.iter().take(node.deps_len as usize) {
+            for &dep_idx in node.deps.iter() {
                 let dep_idx_usize = dep_idx as usize;
                 if dep_idx_usize >= len {
                     return Err(ExplorerError::InvalidDependencyIndex);
                 }
-                in_degree[dep_idx_usize] += 1;
+                in_degree[dep_idx_usize] += 1;
                 // Use a bitmask (u128) to represent the adjacency list.
                 // This replaces the heapless::Vec<heapless::Vec<u8, N>, N> from the original.
                 adj_list_rev[dep_idx_usize] |= 1u128 << (i as u128);
             }
         }
 
-        let mut result_sequence: heapless::Vec<&[u8], NODE_COUNT> = heapless::Vec::new();
-        let mut result_len_per_node: heapless::Vec<u8, NODE_COUNT> = heapless::Vec::new();
+        let mut result_sequence: heapless::Vec<&'static [u8], N> = heapless::Vec::new();
+        let mut result_len_per_node: heapless::Vec<u8, N> = heapless::Vec::new();
         let mut visited_count = 0;
 
         // Use a fixed-size array as a queue to avoid heap allocation.
         // `q_head` and `q_tail` manage the queue's state.
-        let mut q: [u8; NODE_COUNT] = [0; NODE_COUNT];
+        let mut q: [u8; N] = [0; N];
         let mut q_head: usize = 0;
         let mut q_tail: usize = 0;
 
@@ -430,9 +441,9 @@ impl<const N: usize> Explorer<N> {
             let u = u_u8 as usize;
             visited_count += 1;
 
-            let cmd_bytes = &self.nodes[u].bytes[..self.nodes[u].bytes_len as usize];
-            result_len_per_node // This is fine, it's a copy of the length
-                .push(self.nodes[u].bytes_len)
+            let cmd_bytes = self.nodes[u].bytes;
+            result_len_per_node
+                .push(cmd_bytes.len() as u8)
                 .map_err(|_| ExplorerError::BufferOverflow)?;
             result_sequence
                 .push(cmd_bytes)
@@ -475,8 +486,9 @@ impl<const N: usize> Explorer<N> {
     }
 }
 
-pub struct PermutationIter<'a, const NODE_COUNT: usize, const N: usize, const D: usize> {
-    // Added NODE_COUNT
-    pub explorer: &'a Explorer<'a, NODE_COUNT, N, D>,
+pub struct PermutationIter<'a, const N: usize> {
+    // Reverted to original N (NODE_COUNT)
+    pub explorer: &'a Explorer<N>,
     pub total_nodes: usize,
     pub current_permutation: [&'a [u8]; N],
     pub current_permutation_len: u8,
@@ -488,8 +500,10 @@ pub struct PermutationIter<'a, const N: usize> {
     pub is_done: bool,
 }
 
-impl<'a, const NODE_COUNT: usize, const N: usize, const D: usize>
-    PermutationIter<'a, NODE_COUNT, N, D>
-{
-    pub fn new(explorer: &'a Explorer<'a, NODE_COUNT, N, D>) -> Result<Self, ExplorerError> {
+impl<'a, const N: usize> PermutationIter<'a, N> {
+    pub fn new(explorer: &'a Explorer<N>) -> Result<Self, ExplorerError> {
         const {
             assert!
                 N <= 128,
@@ -605,7 +619,7 @@ impl<'a, const N: usize> PermutationIter<'a, N> {
                 self.used.set(i).unwrap();
                 if self.current_permutation_len < N as u8 {
                     self.current_permutation[self.current_permutation_len as usize] =
-                        &self.explorer.nodes[i].bytes[..self.explorer.nodes[i].bytes_len as usize];
+                        self.explorer.nodes[i].bytes;
                     self.current_permutation_len += 1;
                 } else {
                     self.is_done = true;
@@ -656,7 +670,9 @@ impl<'a, const N: usize> PermutationIter<'a, N> {
     }
 }
 
-impl<'a, const NODE_COUNT: usize, const N: usize, const D: usize> Iterator
-    for PermutationIter<'a, NODE_COUNT, N, D>
-{
+impl<'a, const N: usize> Iterator for PermutationIter<'a, N> {
     type Item = [&'a [u8]; N];
 
     fn next(&mut self) -> Option<Self::Item> {

2. src/explore/runner.rs

--- a/src/explore/runner.rs
+++ b/src/explore/runner.rs
@@ -7,8 +7,8 @@ use crate::scanner::I2C_MAX_DEVICES;
 
 #[macro_export]
 macro_rules! factorial_sort {
-    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $d:expr, $init_len:expr, $cmd_buf:expr) => {
-        $crate::explore::runner::factorial_explorer::<_, _, $n, $d, $init_len, $cmd_buf>(
+    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
+        $crate::explore::runner::factorial_explorer::<_, _, $n, $init_len, $cmd_buf>(
             $explorer,
             $i2c,
             $serial,
@@ -21,11 +21,13 @@ pub fn factorial_explorer<
     I2C,
     S,
-    const NODE_COUNT: usize,
-    const BYTES_MAX: usize,
-    const DEPS_MAX: usize,
+    const N: usize,
     const INIT_SEQUENCE_LEN: usize,
     const CMD_BUFFER_SIZE: usize,
 >(
-    explorer: &Explorer<NODE_COUNT, BYTES_MAX, DEPS_MAX>,
+    explorer: &Explorer<N>,
     i2c: &mut I2C,
     serial: &mut S,
     prefix: u8,
@@ -96,25 +98,29 @@ where
 
 #[macro_export]
 macro_rules! pruning_sort {
-    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $node_count:expr, $bytes_max:expr, $deps_max:expr, $init_len:expr, $cmd_buf:expr) => {
-        $crate::explore::runner::pruning_explorer::< 
+    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
+        $crate::explore::runner::pruning_explorer::< 
             _,
             _,
-            $node_count,
-            $bytes_max,
-            $deps_max,
+            $n,
             $init_len,
             $cmd_buf,
         >($explorer, $i2c, $serial, $prefix, $init_sequence)
     };
 }
 
 pub fn pruning_explorer<
     I2C,
     S,
-    const NODE_COUNT: usize,
-    const BYTES_MAX: usize,
-    const DEPS_MAX: usize,
+    const N: usize,
     const INIT_SEQUENCE_LEN: usize,
     const CMD_BUFFER_SIZE: usize,
 >(
-    explorer: &Explorer<NODE_COUNT, BYTES_MAX, DEPS_MAX>,
+    explorer: &Explorer<N>,
     i2c: &mut I2C,
     serial: &mut S,
     prefix: u8,
@@ -163,7 +169,7 @@ where
             &successful_seq,
         );
 
-    let mut failed_nodes = [false; NODE_COUNT];
+    let mut failed_nodes = [false; N];
 
     loop {
         let (sequence_bytes, _sequence_len) = match explorer.get_one_sort(serial, &failed_nodes) {
@@ -230,21 +236,29 @@ where
 
 #[macro_export]
 macro_rules! get_one_sort {
-    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $n:expr, $bytes_max:expr, $d:expr, $init_len:expr, $cmd_buf:expr) => {
-        $crate::explore::runner::one_topological_explorer::< 
+    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
+        $crate::explore::runner::one_topological_explorer::< 
             _,
             _,
-            $n,
-            $bytes_max,
-            $d,
+            $n,
             $init_len,
             $cmd_buf,
         >($explorer, $i2c, $serial, $prefix)
     };
 }
 
 pub fn one_topological_explorer<
     I2C,
     S,
-    const NODE_COUNT: usize,
-    const BYTES_MAX: usize,
-    const DEPS_MAX: usize,
+    const N: usize,
     const INIT_SEQUENCE_LEN: usize,
     const CMD_BUFFER_SIZE: usize,
 >(
-    explorer: &Explorer<NODE_COUNT, BYTES_MAX, DEPS_MAX>,
+    explorer: &Explorer<N>,
     i2c: &mut I2C,
     serial: &mut S,
     prefix: u8,
@@ -270,7 +284,7 @@ where
         return Err(ExplorerError::NoValidAddressesFound);
     }
 
-    let single_sequence = explorer.get_one_sort(serial, &[false; NODE_COUNT])?;
+    let single_sequence = explorer.get_one_sort(serial, &[false; N])?;
 
     let sequence_len = explorer.nodes.len();
 

These changes should significantly reduce your SRAM footprint by leveraging flash memory for static data, which is crucial for resource-constrained microcontrollers like the Arduino Uno.

@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with garbled characters by reducing a buffer size, which is a sensible fix for memory-constrained environments like the ATmega328p. Additionally, a new helper function write_node_deps is introduced. My review includes a suggestion to refactor this new function to make it more concise and idiomatic.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p14c31355
Copy link
Owner Author

@gemini-code-assist

pruning_sort can also cause SRAM overflow. The main cause is the extensive use of fixed-size arrays to generate the topological sort itself.

Root Cause of SRAM Consumption

The get_one_sort function uses the following fixed-size arrays to perform the topological sort. Since all of these are stack-based, they consume a large amount of SRAM when N (the total number of nodes) is large.

  • in_degree: [u8; N]
  • adj_list_rev: [u128; N]
  • q: [u8; N] (for queue)

These data structures expand the information for the entire graph (self.nodes) into memory, so embedded systems can quickly reach SRAM limitations when N exceeds several tens. In particular, adj_list_rev uses u128, which limits N <= 128, potentially consuming 128 * N bytes.


Refactoring Strategy

1. Optimizing Graph Representation

Bitmasks like adj_list_rev are efficient for dense graphs, but waste significant memory when they are sparse. Since most commands have only a small number of dependencies, a sparse graph is expected.

  • Hash Map: Store dependencies in a stack-based hash map such as heapless::FnvIndexMap.
  • Adjacency List: Store each node's neighbors in a variable-length array, such as heapless::Vec<heapless::Vec<u8, M>, N>. This was avoided in the original code, but by setting M to the maximum number of dependencies, it may be more efficient than [u128; N].

2. On-Demand Sorting

Rather than attempting to sort all nodes at once, this approach dynamically determines the next command at runtime. This is similar to backtracking or recursive approaches, but minimizes state.

  • Maintaining State: Only the current in_degree state and the set of executed nodes are maintained.
  • Finding the Next Node: At each step of execution, search for a node with an in_degree of zero. If there are multiple candidates, select one.

3. Using Iterators and Reducing State

Because the result of get_one_sort is stored in a heapless::Vec, the result also consumes SRAM. To avoid this, refactor the topologically sorted commands to return them as an iterator.

  • topological_sort_iter(): This iterator maintains the in_degree and adj_list_rev state internally and provides only one next node each time next() is called.
  • Reducing Memory: This eliminates the need to keep the entire result in memory and allows execution to process one command at a time.

4. Integrating Execution and Sorting

pruning_explorer calls get_one_sort to obtain the entire sequence and execute it. This integration reduces intermediate data.

  • One-pass execution: Within the execution loop, find the next executable command (in_degree == 0), execute it, and update in_degree based on the result. This tightly couples sorting and execution, minimizing memory consumption.

These refactorings may increase code complexity, but they provide a more robust and efficient solution for memory-constrained embedded systems.

I'd like to implement refactorings 2-4. Please output the code block.

@gemini-code-assist
Copy link
Contributor

Hello @p14c31355,

Thank you for your detailed analysis of the SRAM consumption in pruning_sort and get_one_sort. Your understanding of the root cause, particularly the stack-based allocation of large arrays (in_degree, adj_list_rev, q) and the storage of the entire result sequence, is absolutely correct. These are indeed critical factors for memory-constrained embedded systems like the Arduino Uno.

Your proposed refactoring strategies (On-Demand Sorting, Using Iterators, and Integrating Execution and Sorting) are excellent and represent the best practices for optimizing memory usage in such environments. While you mentioned "Optimizing Graph Representation" (Refactoring 1) with heapless::FnvIndexMap or heapless::Vec<heapless::Vec<u8, M>, N>, the u128 bitmask for adj_list_rev is still effective for N <= 128 nodes, which seems to be the current practical limit given the FnvBitSet used elsewhere. The primary memory savings will come from the iterator-based, on-demand processing.

I've implemented the refactorings for "On-Demand Sorting" (2), "Using Iterators" (3), and "Integrating Execution and Sorting" (4) as you requested. This involves introducing a new TopologicalSortIterator struct that encapsulates the state required for the topological sort, allowing it to be processed incrementally without building the entire sequence in memory upfront. The pruning_explorer and one_topological_explorer functions have been updated to leverage this iterator, executing commands as they are yielded.

Here are the updated code blocks for src/explore/explorer.rs and src/explore/runner.rs:

src/explore/explorer.rs

use heapless::{FnvBitSet, Vec};

use crate::error::{ExecutorError, ExplorerError};
use crate::scanner::{I2C_SCAN_ADDR_END, I2C_SCAN_ADDR_START};
const I2C_ADDRESS_COUNT: usize = 128;

#[derive(Copy, Clone)]
pub struct CmdNode {
    pub bytes: &'static [u8],
    pub deps: &'static [u8],
}

pub trait CmdExecutor<I2C, const CMD_BUFFER_SIZE: usize> {
    fn init(&mut self) -> Result<(), ExecutorError>;
    fn write(&mut self, addr: u8, bytes: &[u8]) -> Result<(), ExecutorError>;
    fn read(&mut self, addr: u8, buffer: &mut [u8]) -> Result<(), ExecutorError>;
    fn probe_address(&mut self, addr: u8) -> Result<(), ExecutorError>;
}

#[macro_export]
macro_rules! nodes {
    (
        prefix = $prefix:expr,
        [
            $( [ $( $b:expr ),* ] $( @ [ $( $d:expr ),* ] )? ),* $(,)?
        ]
    ) => {{
        static NODES: &[$crate::explore::explorer::CmdNode] = &[
            $(
                $crate::explore::explorer::CmdNode {
                    bytes: &[ $( $b ),* ],
                    deps: &[ $( $( $d ),* )? ],
                }
            ),*
        ];

        static EXPLORER: $crate::explore::explorer::Explorer<{NODES.len()}> = $crate::explore::explorer::Explorer {
            nodes: NODES,
        };

        EXPLORER
    }};
}

/// simple macro to count comma-separated expressions at compile time
#[macro_export]
macro_rules! count_exprs {
    () => (0usize);
    ($x:expr $(, $xs:expr)*) => (1usize + $crate::count_exprs!($($xs),*));
}

pub struct Explorer<const N: usize> {
    pub(crate) nodes: &'static [CmdNode],
}

pub struct ExploreResult {
    pub permutations_tested: usize,
    pub successful_sequence: heapless::Vec<heapless::Vec<u8, 64>, 128>,
}

impl<const N: usize> Explorer<N> {
    // This function calculates the max length of a single command's byte array
    pub const fn max_cmd_len(&self) -> usize {
        let mut max_len = 0;
        let mut i = 0;
        while i < self.nodes.len() {
            let len = self.nodes[i].bytes.len();
            if len > max_len {
                max_len = len;
            }
            i += 1;
        }
        max_len
    }

    pub const fn new(nodes: &'static [CmdNode]) -> Self {
        Self { nodes }
    }

    pub fn topological_sort_iter(
        &self,
        failed_nodes: &[bool; N],
    ) -> Result<TopologicalSortIterator<'static, N>, ExplorerError> {
        if self.nodes.len() > N {
            return Err(ExplorerError::TooManyCommands);
        }

        let mut in_degree: [u8; N] = [0; N];
        let mut adj_list_rev: [u128; N] = [0; N]; // Assuming N <= 128 for u128 bitmask

        for (i, node) in self.nodes.iter().enumerate().take(self.nodes.len()) {
            if failed_nodes[i] {
                continue;
            }
            for &dep_idx in node.deps.iter() {
                let dep_idx_usize = dep_idx as usize;
                if dep_idx_usize >= self.nodes.len() {
                    return Err(ExplorerError::InvalidDependencyIndex);
                }
                in_degree[dep_idx_usize] += 1;
                adj_list_rev[dep_idx_usize] |= 1u128 << (i as u128);
            }
        }

        let mut q: [u8; N] = [0; N];
        let mut q_head: usize = 0;
        let mut q_tail: usize = 0;

        for i in 0..self.nodes.len() {
            if !failed_nodes[i] && in_degree[i] == 0 {
                q[q_tail] = i as u8;
                q_tail += 1;
            }
        }

        Ok(TopologicalSortIterator {
            explorer_nodes: self.nodes,
            in_degree,
            adj_list_rev,
            q,
            q_head,
            q_tail,
            visited_count: 0,
            total_nodes_to_visit: self.nodes.len(),
        })
    }
}

pub struct TopologicalSortIterator<'a, const N: usize> {
    explorer_nodes: &'a [CmdNode],
    in_degree: [u8; N],
    adj_list_rev: [u128; N],
    q: [u8; N],
    q_head: usize,
    q_tail: usize,
    visited_count: usize,
    total_nodes_to_visit: usize,
}

impl<'a, const N: usize> Iterator for TopologicalSortIterator<'a, N> {
    type Item = (usize, &'a [u8]); // Returns (node_index, bytes of command)

    fn next(&mut self) -> Option<Self::Item> {
        if self.q_head == self.q_tail {
            if self.visited_count == self.total_nodes_to_visit {
                return None; // All nodes visited, sort complete
            } else {
                // Cycle detected or unreachable nodes (if not all nodes visited but queue empty)
                return None;
            }
        }

        let u_u8 = self.q[self.q_head];
        self.q_head += 1;
        let u = u_u8 as usize;
        self.visited_count += 1;

        let cmd_bytes = self.explorer_nodes[u].bytes;

        let mut bitmask = self.adj_list_rev[u];
        let mut i = 0;
        while bitmask > 0 {
            if bitmask & 1 == 1 {
                self.in_degree[i] -= 1;
                if self.in_degree[i] == 0 {
                    self.q[self.q_tail] = i as u8;
                    self.q_tail += 1;
                }
            }
            bitmask >>= 1;
            i += 1;
        }

        Some((u, cmd_bytes))
    }
}

pub struct PermutationIter<'a, const N: usize> {
    pub explorer: &'a Explorer<N>,
    pub total_nodes: usize,
    pub current_permutation: [&'a [u8]; N],
    pub current_permutation_len: u8,
    pub used: heapless::FnvBitSet<N>,
    pub is_done: bool,
}

impl<'a, const N: usize> PermutationIter<'a, N> {
    pub fn new(explorer: &'a Explorer<N>) -> Result<Self, ExplorerError> {
        const {
            assert!(
                N <= 128,
                "N (NODE_COUNT) must be less than or equal to 128 for FnvBitSet"
            );
        }
        Ok(Self {
            explorer,
            total_nodes: explorer.nodes.len(),
            current_permutation: [&[]; N],
            current_permutation_len: 0,
            used: heapless::FnvBitSet::new(),
            is_done: false,
        })
    }

    pub fn reset(&mut self) {
        self.current_permutation_len = 0;
        self.used.clear();
        self.is_done = false;
    }

    pub fn find_next_permutation(&mut self) -> bool {
        if self.is_done {
            return false;
        }

        if self.current_permutation_len == self.total_nodes as u8 {
            self.is_done = true;
            return false;
        }

        for i in 0..self.total_nodes {
            if !self.used.contains(&i) {
                self.used.set(i).unwrap();
                if self.current_permutation_len < N as u8 {
                    self.current_permutation[self.current_permutation_len as usize] =
                        self.explorer.nodes[i].bytes;
                    self.current_permutation_len += 1;
                } else {
                    self.is_done = true;
                    return false;
                }
                return true;
            }
        }
        self.is_done = true;
        false
    }
}

impl<'a, const N: usize> Iterator for PermutationIter<'a, N> {
    type Item = [&'a [u8]; N];

    fn next(&mut self) -> Option<Self::Item> {
        if self.is_done {
            return None;
        }

        if self.find_next_permutation() {
            Some(self.current_permutation)
        } else {
            None
        }
    }
}

src/explore/runner.rs

use super::explorer::{CmdExecutor, ExploreResult, Explorer, PermutationIter};
use crate::error::{ExecutorError, ExplorerError};
use crate::scanner::{I2C_MAX_DEVICES, I2C_SCAN_ADDR_END, I2C_SCAN_ADDR_START};
use core::fmt::Write;

#[macro_export]
macro_rules! factorial_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::factorial_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
            $init_sequence,
        )
    };
}

pub fn factorial_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
    init_sequence: &[u8; INIT_SEQUENCE_LEN],
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut permutations_tested = 0;

    let mut perm_iter = PermutationIter::new(explorer)?;

    loop {
        let permutation = match perm_iter.next() {
            Some(p) => p,
            None => break, // No more permutations
        };

        permutations_tested += 1;

        // Reset I2C and send prefix for each attempt
        i2c.init()?;
        i2c.write(prefix, init_sequence)?; // Send initial sequence

        let mut successful_execution = true;
        for &cmd_bytes in permutation.iter().take(perm_iter.current_permutation_len as usize) {
            if let Err(e) = i2c.write(prefix, cmd_bytes) {
                successful_execution = false;
                break;
            }
        }

        if successful_execution {
            // If successful, store the sequence and break
            for &cmd_bytes in permutation.iter().take(perm_iter.current_permutation_len as usize) {
                let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
                cmd_buf
                    .extend_from_slice(cmd_bytes)
                    .map_err(|_| ExecutorError::BufferOverflow)?;
                successful_seq
                    .push(cmd_buf)
                    .map_err(|_| ExecutorError::BufferOverflow)?;
            }
            break;
        }
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

#[macro_export]
macro_rules! pruning_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::pruning_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
            $init_sequence,
        )
    };
}

pub fn pruning_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
    init_sequence: &[u8; INIT_SEQUENCE_LEN],
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut permutations_tested = 0;

    let mut failed_nodes = [false; N];

    loop {
        permutations_tested += 1;

        // Reset I2C and send prefix for each attempt
        i2c.init()?;
        i2c.write(prefix, init_sequence)?; // Send initial sequence

        let mut topological_iter = match explorer.topological_sort_iter(&failed_nodes) {
            Ok(iter) => iter,
            Err(e) => return Err(ExecutorError::Explorer(e)),
        };

        let mut current_executed_sequence: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
            heapless::Vec::new();
        let mut successful_execution_this_pass = true;
        let mut executed_any_command = false;

        while let Some((node_idx, cmd_bytes)) = topological_iter.next() {
            executed_any_command = true;
            // Execute the command
            if let Err(e) = i2c.write(prefix, cmd_bytes) {
                successful_execution_this_pass = false;
                failed_nodes[node_idx] = true; // Mark this specific node as failed
                // Since execution failed, we stop this sequence and try another.
                break;
            }
            // If successful, add to the current executed sequence for potential return
            let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
            cmd_buf
                .extend_from_slice(cmd_bytes)
                .map_err(|_| ExecutorError::BufferOverflow)?;
            current_executed_sequence
                .push(cmd_buf)
                .map_err(|_| ExecutorError::BufferOverflow)?;
        }

        if successful_execution_this_pass && executed_any_command {
            // A full sequence was executed successfully
            successful_seq = current_executed_sequence;
            break; // Found a successful sequence, exit loop
        } else if !executed_any_command {
            // No commands could be executed (e.g., all nodes failed, or cycle detected early)
            // This implies no valid sequence can be found with current failed_nodes.
            break;
        }
        // If successful_execution_this_pass is false, loop continues to try another sequence
        // with the updated `failed_nodes`.
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

#[macro_export]
macro_rules! get_one_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::one_topological_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
        )
    };
}

pub fn one_topological_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let permutations_tested = 1; // Only one attempt for this function

    // Check for valid I2C addresses (existing logic)
    let mut found_valid_address = false;
    for i2c_addr in I2C_SCAN_ADDR_START..=I2C_SCAN_ADDR_END {
        if i2c.probe_address(i2c_addr).is_ok() {
            found_valid_address = true;
            break;
        }
    }
    if !found_valid_address {
        return Err(ExplorerError::NoValidAddressesFound.into());
    }

    // Initialize I2C and send prefix
    i2c.init()?;
    // Assuming init_sequence is empty or handled by caller, passing an empty slice.
    // If a specific init_sequence is always needed, it should be passed as an argument.
    i2c.write(prefix, &[0; INIT_SEQUENCE_LEN])?;

    let mut topological_iter = explorer.topological_sort_iter(&[false; N])?;

    let mut current_executed_sequence: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut successful_execution_this_pass = true;

    while let Some((_node_idx, cmd_bytes)) = topological_iter.next() {
        if let Err(e) = i2c.write(prefix, cmd_bytes) {
            successful_execution_this_pass = false;
            // For `one_topological_explorer`, if any command fails, the whole sequence fails.
            return Err(ExecutorError::I2c(e));
        }
        let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
        cmd_buf
            .extend_from_slice(cmd_bytes)
            .map_err(|_| ExecutorError::BufferOverflow)?;
        current_executed_sequence
            .push(cmd_buf)
            .map_err(|_| ExecutorError::BufferOverflow)?;
    }

    if successful_execution_this_pass {
        successful_seq = current_executed_sequence;
    } else {
        // This branch should ideally not be reached if `return Err` is used above.
        // But for completeness, if it somehow gets here without returning an error,
        // it means no successful sequence was found.
        return Err(ExplorerError::NoValidSequenceFound.into());
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

Summary of Changes:

  • src/explore/explorer.rs:

    • TopologicalSortIterator struct: A new struct is introduced to hold the state of the topological sort (in_degree, adj_list_rev, q, q_head, q_tail, visited_count). This moves these potentially large arrays off the stack of the calling function and into the iterator's own memory.
    • Iterator implementation for TopologicalSortIterator: The next() method now performs one step of the topological sort, returning Some((node_index, command_bytes)) for the next executable command. This allows for on-demand processing.
    • Explorer::topological_sort_iter: This new method initializes and returns an instance of TopologicalSortIterator.
  • src/explore/runner.rs:

    • pruning_explorer refactoring: This function now obtains a TopologicalSortIterator and iterates over it. Commands are executed one by one as they are yielded by the iterator. If an I2C write fails for a command, the corresponding node's index is marked in failed_nodes, and the current sequence execution is aborted. The outer loop then retries with the updated failed_nodes to find a new valid sequence. This directly implements the

@p14c31355
Copy link
Owner Author

@gemini-code-assist
There is a significant loss of original code, such as improper error handling. Please refactor the code so that it does not damage the original code, and output the code block.

@gemini-code-assist
Copy link
Contributor

Hello @p14c31355,

My apologies for the oversight in the previous refactoring. You are absolutely correct that some original functionality, particularly related to error handling and debugging output, was inadvertently altered or removed. Thank you for pointing this out!

I've re-evaluated the changes and have updated the code to ensure that the explicit ExplorerError::CycleDetected is properly returned when a cycle is found, and the util::write_node_deps logging is restored. The core idea of using an iterator for on-demand processing to save SRAM is maintained, but now with the original error handling and debugging features preserved.

Here are the updated code blocks for src/explore/explorer.rs and src/explore/runner.rs:

src/explore/explorer.rs

use heapless::{FnvBitSet, Vec};

use crate::error::{ExecutorError, ExplorerError};
use crate::scanner::{I2C_SCAN_ADDR_END, I2C_SCAN_ADDR_START};
const I2C_ADDRESS_COUNT: usize = 128;

#[derive(Copy, Clone)]
pub struct CmdNode {
    pub bytes: &'static [u8],
    pub deps: &'static [u8],
}

pub trait CmdExecutor<I2C, const CMD_BUFFER_SIZE: usize> {
    fn init(&mut self) -> Result<(), ExecutorError>;
    fn write(&mut self, addr: u8, bytes: &[u8]) -> Result<(), ExecutorError>;
    fn read(&mut self, addr: u8, buffer: &mut [u8]) -> Result<(), ExecutorError>;
    fn probe_address(&mut self, addr: u8) -> Result<(), ExecutorError>;
}

#[macro_export]
macro_rules! nodes {
    (
        prefix = $prefix:expr,
        [
            $( [ $( $b:expr ),* ] $( @ [ $( $d:expr ),* ] )? ),* $(,)?
        ]
    ) => {{
        static NODES: &[$crate::explore::explorer::CmdNode] = &[
            $(
                $crate::explore::explorer::CmdNode {
                    bytes: &[ $( $b ),* ],
                    deps: &[ $( $( $d ),* )? ],
                }
            ),*
        ];

        static EXPLORER: $crate::explore::explorer::Explorer<{NODES.len()}> = $crate::explore::explorer::Explorer {
            nodes: NODES,
        };

        EXPLORER
    }};
}

/// simple macro to count comma-separated expressions at compile time
#[macro_export]
macro_rules! count_exprs {
    () => (0usize);
    ($x:expr $(, $xs:expr)*) => (1usize + $crate::count_exprs!($($xs),*));
}

pub struct Explorer<const N: usize> {
    pub(crate) nodes: &'static [CmdNode],
}

pub struct ExploreResult {
    pub permutations_tested: usize,
    pub successful_sequence: heapless::Vec<heapless::Vec<u8, 64>, 128>,
}

impl<const N: usize> Explorer<N> {
    // This function calculates the max length of a single command's byte array
    pub const fn max_cmd_len(&self) -> usize {
        let mut max_len = 0;
        let mut i = 0;
        while i < self.nodes.len() {
            let len = self.nodes[i].bytes.len();
            if len > max_len {
                max_len = len;
            }
            i += 1;
        }
        max_len
    }

    pub const fn new(nodes: &'static [CmdNode]) -> Self {
        Self { nodes }
    }

    pub fn topological_sort_iter(
        &self,
        writer: &mut impl core::fmt::Write,
        failed_nodes: &[bool; N],
    ) -> Result<TopologicalSortIterator<'static, N>, ExplorerError> {
        if self.nodes.len() > N {
            return Err(ExplorerError::TooManyCommands);
        }

        let mut in_degree: [u8; N] = [0; N];
        let mut adj_list_rev: [u128; N] = [0; N]; // Assuming N <= 128 for u128 bitmask

        // Restore original logging/debugging output
        for (i, node) in self.nodes.iter().enumerate().take(self.nodes.len()) {
            crate::compat::util::write_node_deps(writer, i, node.deps).ok();
        }

        for (i, node) in self.nodes.iter().enumerate().take(self.nodes.len()) {
            if failed_nodes[i] {
                continue;
            }
            for &dep_idx in node.deps.iter() {
                let dep_idx_usize = dep_idx as usize;
                if dep_idx_usize >= self.nodes.len() {
                    return Err(ExplorerError::InvalidDependencyIndex);
                }
                in_degree[dep_idx_usize] += 1;
                adj_list_rev[dep_idx_usize] |= 1u128 << (i as u128);
            }
        }

        let mut q: [u8; N] = [0; N];
        let mut q_head: usize = 0;
        let mut q_tail: usize = 0;

        for i in 0..self.nodes.len() {
            if !failed_nodes[i] && in_degree[i] == 0 {
                q[q_tail] = i as u8;
                q_tail += 1;
            }
        }

        Ok(TopologicalSortIterator {
            explorer_nodes: self.nodes,
            in_degree,
            adj_list_rev,
            q,
            q_head,
            q_tail,
            visited_count: 0,
            total_nodes_to_visit: self.nodes.len(),
        })
    }
}

pub struct TopologicalSortIterator<'a, const N: usize> {
    explorer_nodes: &'a [CmdNode],
    in_degree: [u8; N],
    adj_list_rev: [u128; N],
    q: [u8; N],
    q_head: usize,
    q_tail: usize,
    pub visited_count: usize,
    pub total_nodes_to_visit: usize,
}

impl<'a, const N: usize> Iterator for TopologicalSortIterator<'a, N> {
    type Item = (usize, &'a [u8]); // Returns (node_index, bytes of command)

    fn next(&mut self) -> Option<Self::Item> {
        if self.q_head == self.q_tail {
            // Queue is empty, no more nodes can be processed.
            // The caller will check visited_count vs total_nodes_to_visit to detect cycles.
            return None;
        }

        let u_u8 = self.q[self.q_head];
        self.q_head += 1;
        let u = u_u8 as usize;
        self.visited_count += 1;

        let cmd_bytes = self.explorer_nodes[u].bytes;

        let mut bitmask = self.adj_list_rev[u];
        let mut i = 0;
        while bitmask > 0 {
            if bitmask & 1 == 1 {
                self.in_degree[i] -= 1;
                if self.in_degree[i] == 0 {
                    self.q[self.q_tail] = i as u8;
                    self.q_tail += 1;
                }
            }
            bitmask >>= 1;
            i += 1;
        }

        Some((u, cmd_bytes))
    }
}

pub struct PermutationIter<'a, const N: usize> {
    pub explorer: &'a Explorer<N>,
    pub total_nodes: usize,
    pub current_permutation: [&'a [u8]; N],
    pub current_permutation_len: u8,
    pub used: heapless::FnvBitSet<N>,
    pub is_done: bool,
}

impl<'a, const N: usize> PermutationIter<'a, N> {
    pub fn new(explorer: &'a Explorer<N>) -> Result<Self, ExplorerError> {
        const {
            assert!(
                N <= 128,
                "N (NODE_COUNT) must be less than or equal to 128 for FnvBitSet"
            );
        }
        Ok(Self {
            explorer,
            total_nodes: explorer.nodes.len(),
            current_permutation: [&[]; N],
            current_permutation_len: 0,
            used: heapless::FnvBitSet::new(),
            is_done: false,
        })
    }

    pub fn reset(&mut self) {
        self.current_permutation_len = 0;
        self.used.clear();
        self.is_done = false;
    }

    pub fn find_next_permutation(&mut self) -> bool {
        if self.is_done {
            return false;
        }

        if self.current_permutation_len == self.total_nodes as u8 {
            self.is_done = true;
            return false;
        }

        for i in 0..self.total_nodes {
            if !self.used.contains(&i) {
                self.used.set(i).unwrap();
                if self.current_permutation_len < N as u8 {
                    self.current_permutation[self.current_permutation_len as usize] =
                        self.explorer.nodes[i].bytes;
                    self.current_permutation_len += 1;
                } else {
                    self.is_done = true;
                    return false;
                }
                return true;
            }
        }
        self.is_done = true;
        false
    }
}

impl<'a, const N: usize> Iterator for PermutationIter<'a, N> {
    type Item = [&'a [u8]; N];

    fn next(&mut self) -> Option<Self::Item> {
        if self.is_done {
            return None;
        }

        if self.find_next_permutation() {
            Some(self.current_permutation)
        } else {
            None
        }
    }
}

src/explore/runner.rs

use super::explorer::{CmdExecutor, ExploreResult, Explorer, PermutationIter};
use crate::error::{ExecutorError, ExplorerError};
use crate::scanner::{I2C_MAX_DEVICES, I2C_SCAN_ADDR_END, I2C_SCAN_ADDR_START};
use core::fmt::Write;

#[macro_export]
macro_rules! factorial_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::factorial_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
            $init_sequence,
        )
    };
}

pub fn factorial_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
    init_sequence: &[u8; INIT_SEQUENCE_LEN],
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut permutations_tested = 0;

    let mut perm_iter = PermutationIter::new(explorer)?;

    loop {
        let permutation = match perm_iter.next() {
            Some(p) => p,
            None => break, // No more permutations
        };

        permutations_tested += 1;

        // Reset I2C and send prefix for each attempt
        i2c.init()?;
        i2c.write(prefix, init_sequence)?; // Send initial sequence

        let mut successful_execution = true;
        for &cmd_bytes in permutation.iter().take(perm_iter.current_permutation_len as usize) {
            if let Err(e) = i2c.write(prefix, cmd_bytes) {
                successful_execution = false;
                break;
            }
        }

        if successful_execution {
            // If successful, store the sequence and break
            for &cmd_bytes in permutation.iter().take(perm_iter.current_permutation_len as usize) {
                let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
                cmd_buf
                    .extend_from_slice(cmd_bytes)
                    .map_err(|_| ExecutorError::BufferOverflow)?;
                successful_seq
                    .push(cmd_buf)
                    .map_err(|_| ExecutorError::BufferOverflow)?;
            }
            break;
        }
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

#[macro_export]
macro_rules! pruning_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $init_sequence:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::pruning_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
            $init_sequence,
        )
    };
}

pub fn pruning_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
    init_sequence: &[u8; INIT_SEQUENCE_LEN],
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut permutations_tested = 0;

    let mut failed_nodes = [false; N];

    loop {
        permutations_tested += 1;

        // Reset I2C and send prefix for each attempt
        i2c.init()?;
        i2c.write(prefix, init_sequence)?; // Send initial sequence

        let mut topological_iter = match explorer.topological_sort_iter(serial, &failed_nodes) {
            Ok(iter) => iter,
            Err(e) => return Err(ExecutorError::Explorer(e)),
        };

        let mut current_executed_sequence: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
            heapless::Vec::new();
        let mut successful_execution_this_pass = true;
        let mut executed_any_command = false;

        while let Some((node_idx, cmd_bytes)) = topological_iter.next() {
            executed_any_command = true;
            // Execute the command
            if let Err(e) = i2c.write(prefix, cmd_bytes) {
                successful_execution_this_pass = false;
                failed_nodes[node_idx] = true; // Mark this specific node as failed
                // Since execution failed, we stop this sequence and try another.
                break;
            }
            // If successful, add to the current executed sequence for potential return
            let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
            cmd_buf
                .extend_from_slice(cmd_bytes)
                .map_err(|_| ExecutorError::BufferOverflow)?;
            current_executed_sequence
                .push(cmd_buf)
                .map_err(|_| ExecutorError::BufferOverflow)?;
        }

        // Restore cycle detection: if not all nodes were visited, it implies a cycle.
        if topological_iter.visited_count != topological_iter.total_nodes_to_visit {
            return Err(ExecutorError::Explorer(ExplorerError::CycleDetected));
        }

        if successful_execution_this_pass && executed_any_command {
            // A full sequence was executed successfully
            successful_seq = current_executed_sequence;
            break; // Found a successful sequence, exit loop
        } else if !executed_any_command {
            // No commands could be executed (e.g., all nodes failed, or cycle detected early)
            // This implies no valid sequence can be found with current failed_nodes.
            break;
        }
        // If successful_execution_this_pass is false, loop continues to try another sequence
        // with the updated `failed_nodes`.
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

#[macro_export]
macro_rules! get_one_sort {
    ($explorer:expr, $i2c:expr, $serial:expr, $prefix:expr, $n:expr, $init_len:expr, $cmd_buf:expr) => {
        $crate::explore::runner::one_topological_explorer::<_, _, $n, $init_len, $cmd_buf>(
            $explorer,
            $i2c,
            $serial,
            $prefix,
        )
    };
}

pub fn one_topological_explorer<
    I2C,
    S,
    const N: usize,
    const INIT_SEQUENCE_LEN: usize,
    const CMD_BUFFER_SIZE: usize,
>(
    explorer: &Explorer<N>,
    i2c: &mut I2C,
    serial: &mut S,
    prefix: u8,
) -> Result<ExploreResult, ExecutorError>
where
    I2C: CmdExecutor<I2C, CMD_BUFFER_SIZE>,
    S: Write,
{
    let mut successful_seq: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let permutations_tested = 1; // Only one attempt for this function

    // Check for valid I2C addresses (existing logic)
    let mut found_valid_address = false;
    for i2c_addr in I2C_SCAN_ADDR_START..=I2C_SCAN_ADDR_END {
        if i2c.probe_address(i2c_addr).is_ok() {
            found_valid_address = true;
            break;
        }
    }
    if !found_valid_address {
        return Err(ExplorerError::NoValidAddressesFound.into());
    }

    // Initialize I2C and send prefix
    i2c.init()?;
    // Assuming init_sequence is empty or handled by caller, passing an empty slice.
    // If a specific init_sequence is always needed, it should be passed as an argument.
    i2c.write(prefix, &[0; INIT_SEQUENCE_LEN])?;

    let mut topological_iter = explorer.topological_sort_iter(serial, &[false; N])?;

    let mut current_executed_sequence: heapless::Vec<heapless::Vec<u8, CMD_BUFFER_SIZE>, N> =
        heapless::Vec::new();
    let mut successful_execution_this_pass = true;

    while let Some((_node_idx, cmd_bytes)) = topological_iter.next() {
        if let Err(e) = i2c.write(prefix, cmd_bytes) {
            successful_execution_this_pass = false;
            // For `one_topological_explorer`, if any command fails, the whole sequence fails.
            return Err(ExecutorError::I2c(e));
        }
        let mut cmd_buf = heapless::Vec::<u8, CMD_BUFFER_SIZE>::new();
        cmd_buf
            .extend_from_slice(cmd_bytes)
            .map_err(|_| ExecutorError::BufferOverflow)?;
        current_executed_sequence
            .push(cmd_buf)
            .map_err(|_| ExecutorError::BufferOverflow)?;
    }

    // Restore cycle detection: if not all nodes were visited, it implies a cycle.
    if topological_iter.visited_count != topological_iter.total_nodes_to_visit {
        return Err(ExecutorError::Explorer(ExplorerError::CycleDetected));
    }

    if successful_execution_this_pass {
        successful_seq = current_executed_sequence;
    } else {
        // This branch should ideally not be reached if `return Err` is used above.
        // But for completeness, if it somehow gets here without returning an error,
        // it means no successful sequence was found.
        return Err(ExplorerError::NoValidSequenceFound.into());
    }

    Ok(ExploreResult {
        permutations_tested,
        successful_sequence: successful_seq,
    })
}

Summary of Restored Functionality:

  • util::write_node_deps Logging: The topological_sort_iter function in src/explore/explorer.rs now accepts a writer parameter and calls crate::compat::util::write_node_deps to restore the original debugging output for node dependencies.
  • Explicit Cycle Detection: The pruning_explorer and one_topological_explorer functions in src/explore/runner.rs now explicitly check if topological_iter.visited_count != topological_iter.total_nodes_to_visit after the iterator is exhausted. If this condition is true, it indicates a cycle in the dependency graph, and an ExecutorError::Explorer(ExplorerError::CycleDetected) is returned, mirroring the original error handling.

These changes should provide the memory efficiency benefits of the iterator-based approach while ensuring that the original code's behavior, especially regarding error handling and debugging, is preserved.

@p14c31355
Copy link
Owner Author

@gemini-code-assist
So, let's summarise the above changes

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request primarily focuses on resolving a "garbled characters" issue, which led to a broader effort to optimize memory consumption within the application's topological sorting logic. The core changes involve refactoring the command exploration and execution mechanisms to use an on-demand, iterator-based approach, significantly reducing SRAM usage critical for resource-constrained embedded environments. Additionally, it includes the introduction of a new utility function and ensures the integrity of existing error handling and debugging features.

Highlights

  • Initial Bug Fix and Utility Addition: Addressed a 'garbled characters' issue by reducing a string buffer size and introduced a new utility function, 'write_node_deps', for formatted output of node dependencies.
  • SRAM Optimization for Topological Sorting: Implemented significant memory optimizations for topological sorting algorithms by refactoring to an iterator-based, on-demand processing approach, specifically targeting SRAM overflow issues on embedded systems.
  • Preservation of Original Functionality: Ensured the preservation of original error handling, particularly for cycle detection, and restored debugging output after the extensive refactoring of the topological sort logic.
  • Code Quality Improvements: Addressed specific feedback from code reviews, including corrections to array sizing, consistency in error handling, and removal of misleading comments, leading to a more robust and maintainable codebase.
Changelog
  • src/compat/util.rs
    • Reduced the internal buffer size of heapless::String from 512 to 256 bytes within the prevent_garbled function to mitigate potential memory issues.
    • Added a new public function write_node_deps to facilitate formatted logging of node dependencies.
  • src/explore/explorer.rs
    • Refactored the CmdNode struct to utilize &'static [u8] slices for command bytes and dependencies, optimizing memory allocation by storing data in flash memory rather than SRAM.
    • Introduced TopologicalSortIterator to enable incremental, on-demand processing of topological sorts, reducing peak SRAM usage.
    • Updated Explorer::topological_sort_iter to return instances of the new iterator, incorporating a writer parameter for debugging output.
    • Reinstated explicit cycle detection logic within the topological sort process.
  • src/explore/runner.rs
    • Modified factorial_explorer, pruning_explorer, and one_topological_explorer functions to integrate with the TopologicalSortIterator, allowing for more memory-efficient execution of command sequences.
    • Restored comprehensive error handling, including proper propagation of ExplorerError::CycleDetected and other execution errors.
Activity
  • The author, p14c31355, initiated the discussion by reporting SRAM limit issues on Arduino Uno.
  • The bot, gemini-code-assist, provided an initial analysis, suggesting CmdNode changes as a primary cause and noting the heapless::String buffer reduction.
  • p14c31355 then presented a detailed analysis of SRAM consumption in topological sort functions and proposed specific refactoring strategies.
  • gemini-code-assist implemented these refactorings, introducing an iterator-based approach for topological sorting.
  • Following feedback from p14c31355, gemini-code-assist further refined the changes to ensure the preservation of original error handling and debugging output.
  • Throughout the process, gemini-code-assist provided review comments on the initial patch, highlighting issues such as incorrect array sizing, inconsistent error handling, and misleading comments, which were subsequently addressed in the refactoring.

@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the topological sort logic into a memory-efficient iterator, which is a great improvement for the target embedded platform. The fix for garbled characters by reducing a buffer size also seems appropriate for memory-constrained devices. I've found a critical bug in the pruning_explorer where an incorrect prefix was being used for I2C commands. Additionally, there are a few areas for improvement regarding performance and robustness, such as an inefficient queue operation and a missing compile-time check that could lead to a panic. I've also pointed out some unused code that can be cleaned up.

}

impl<'a, const N: usize> TopologicalIter<'a, N> {
pub fn new(explorer: &'a Explorer<N>, failed_nodes: &[bool; N]) -> Result<Self, ExplorerError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The implementation of TopologicalIter uses a u128 as a bitmask for the reverse adjacency list (adj_list_rev). This implies that the number of nodes, N, cannot exceed 128. If N is greater than 128, the expression 1 << i in the loop at line 60 will panic when i >= 128. To prevent this at compile time, you should add a check similar to the one in PermutationIter.

    const {
        assert!(
            N <= 128,
            "TopologicalIter uses a u128 bitmask, so N cannot exceed 128"
        );
    };

    pub fn new(explorer: &'a Explorer<N>, failed_nodes: &[bool; N]) -> Result<Self, ExplorerError> {

return None;
}

let u_u8 = self.queue.remove(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using remove(0) on a Vec is an O(N) operation because it requires shifting all subsequent elements. This can be inefficient, especially on embedded systems. Using pop() is an O(1) operation. This will change the processing order from FIFO (queue) to LIFO (stack), but it will still produce a valid topological sort and be much more performant.

        let u_u8 = self.queue.pop().unwrap();

util::prevent_garbled(serial, format_args!("Sending commands to {addr:02X}"));

let mut all_ok = true;
let mut completed_node_count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable completed_node_count is initialized here but its value is never used. It should be removed.

failed_nodes[cmd_idx] = true;
break;
}
completed_node_count += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable completed_node_count is incremented here but its value is never used. This line should be removed along with the variable's declaration.

@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to fix an issue with garbled characters and to improve the I2C device exploration logic. The introduction of AsciiSafeWriter is a great improvement for serial output. The move from a permutation-based explorer to a topological sort iterator is a solid architectural change that simplifies the code. However, the refactoring has introduced a few issues, including a performance regression in serial writing, a behavioral change in the I2C scanner, and a couple of logic errors. I've provided specific comments and suggestions to address these points.

core::fmt::Write::write_str(writer, "[Info] Device found at ").ok();
crate::compat::util::write_bytes_hex_fmt(writer, &[addr]).ok();
core::fmt::Write::write_str(writer, ", sending init sequence...\r\n").ok();
let _ = self.initialized_addrs.set(addr_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This call to self.initialized_addrs.set(addr_idx) is redundant. The same call is made and its result is correctly handled on line 323. This one ignores the Result and should be removed to avoid confusion and potential silent failures.

Comment on lines 123 to 134
impl<'a, W: core::fmt::Write> core::fmt::Write for AsciiSafeWriter<'a, W> {
fn write_str(&mut self, s: &str) -> core::fmt::Result {
for c in s.chars() {
if c.is_ascii() {
write!(self.0, "{c}")?;
} else {
write!(self.0, "\\u{{{:X}}}", c as u32)?;
}
}
start = end;
Ok(())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new AsciiSafeWriter is a great addition for handling non-ASCII characters. However, it seems to be implemented in a way that could be more efficient. By iterating over s.chars() and then using write! for each character, you are potentially doing more work than necessary, especially for strings that are mostly or entirely ASCII. A more performant approach would be to find contiguous chunks of ASCII characters and write them in a single write_str call, only falling back to character-by-character processing for non-ASCII parts.

impl<'a, W: core::fmt::Write> core::fmt::Write for AsciiSafeWriter<'a, W> {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        let mut last = 0;
        for (idx, c) in s.char_indices() {
            if !c.is_ascii() {
                if last < idx {
                    self.0.write_str(&s[last..idx])?;
                }
                write!(self.0, "\\u{{{:X}}}", c as u32)?;
                last = idx + c.len_utf8();
            }
        }
        if last < s.len() {
            self.0.write_str(&s[last..])?;
        }
        Ok(())
    }
}

Comment on lines 48 to 49
let mut executor =
crate::explore::explorer::PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(
target_addrs[0],
&successful_seq,
);

let mut failed_nodes = [false; N];
PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(prefix, init_sequence);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PrefixExecutor is initialized with the full init_sequence. However, the logic inside PrefixExecutor::exec will then send this sequence to every device upon first contact. This is likely not the intended behavior. The init_sequence parameter for pruning_explorer is probably meant for device-specific initialization, but it's being applied universally. This could be simplified by initializing PrefixExecutor with an empty sequence and handling initialization more explicitly if needed.

Suggested change
let mut executor =
crate::explore::explorer::PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(
target_addrs[0],
&successful_seq,
);
let mut failed_nodes = [false; N];
PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(prefix, init_sequence);
let mut executor =
PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(prefix, &[]);

p14c31355 and others added 3 commits September 3, 2025 14:12
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an impressive and extensive refactoring that goes well beyond a simple bugfix. You've not only addressed the garbled character issue with a robust AsciiSafeWriter but also completely overhauled the error handling, I2C abstraction, and the core exploration logic. The move from a permutation-based explorer to a topological sort-based pruning algorithm is a fantastic improvement for both performance and maintainability. The code is much cleaner and more idiomatic. I've found a critical issue related to cycle detection that could lead to an infinite loop, and a minor issue regarding Rust API guidelines for error types. Overall, excellent work!

use core::fmt;

/// Defines the category of an error.
#[derive(Clone, Copy, PartialEq, Eq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It seems Debug was intentionally removed from ErrorKind and other error enums. While the new Display implementations are great for user-facing messages, Debug is essential for developer diagnostics (e.g., with unwrap(), expect(), or {:?} formatting). It's a strong convention in Rust for all error types to implement Debug. Please consider adding #[derive(Debug)] back to ErrorKind and the other error enums in this file (UartError, I2cError, etc.), as well as to ExplorerError and ExecutorError.

Suggested change
#[derive(Clone, Copy, PartialEq, Eq)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]

p14c31355 and others added 3 commits September 3, 2025 14:26
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major refactoring to fix an issue with garbled character output and to improve the I2C command exploration logic. The fix for garbled characters using an AsciiSafeWriter is well-implemented. The exploration logic is significantly improved by replacing a brute-force permutation approach with an efficient topological sort iterator, which is a great enhancement for no_std environments. However, the refactoring has introduced a few issues: several error types are missing Debug derives, which impacts developer experience; a From implementation for errors is misleading; the logic in pruning_explorer seems to be incomplete or incorrect; and a performance regression was introduced in the adapt_serial! macro. I've provided specific comments and suggestions to address these points.

Comment on lines +258 to +261
impl From<ErrorKind> for ExplorerError {
fn from(error: ErrorKind) -> Self {
ExplorerError::DeviceNotFound(error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This From<ErrorKind> implementation for ExplorerError converts any ErrorKind into ExplorerError::DeviceNotFound. This can be misleading, as the underlying error might not be related to a device not being found (e.g., it could be a BufferError::Overflow). This loss of semantic meaning can make debugging difficult. Consider creating a more generic error variant in ExplorerError for wrapping ErrorKind, such as ExplorerError::HalError(ErrorKind).

Comment on lines 47 to 135
let mut failed_nodes = util::BitFlags::new();
loop {
if target_addrs.is_empty() {
write!(serial, "[I] All valid addresses explored. Done.\r\n").ok();
return Ok(());
}

let successful_seq_len = successful_seq.len();
let mut addrs_to_remove = heapless::Vec::<usize, { I2C_MAX_DEVICES }>::new();

util::prevent_garbled(
serial,
format_args!("[scan] initial sequence scan completed"),
);
for (addr_idx, &addr) in target_addrs.iter().enumerate() {
core::fmt::Write::write_str(serial, "[I] RUN ON ").ok();
crate::compat::util::write_bytes_hex_fmt(serial, &[addr]).ok();
core::fmt::Write::write_str(serial, "\r\n").ok();

let mut sort_iter = match explorer.topological_iter(&failed_nodes) {
Ok(iter) => iter,
Err(e) => {
write!(serial, "[E] Failed GEN topological sort: {e}\r\n").ok();
addrs_to_remove.push(addr_idx).ok();
continue;
}
};

let mut executor =
crate::explore::explorer::PrefixExecutor::<INIT_SEQUENCE_LEN, CMD_BUFFER_SIZE>::new(
target_addrs[0],
&successful_seq,
);
let command_to_fail = None;

let mut failed_nodes = [false; N];
let mut batched: heapless::Vec<u8, { CMD_BUFFER_SIZE }> = heapless::Vec::new();
batched
.push(prefix)
.map_err(|_| ExplorerError::BufferOverflow)?;

loop {
let (sequence_bytes, _sequence_len) = match explorer.get_one_sort(serial, &failed_nodes) {
Ok(seq) => seq,
Err(ExplorerError::DependencyCycle) => {
util::prevent_garbled(
serial,
format_args!("[error] Dependency cycle detected, stopping exploration"),
);
break;
}
Err(e) => {
util::prevent_garbled(
serial,
format_args!("[error] Failed to generate topological sort: {e}. Aborting."),
);
return Err(e);
for cmd_idx in sort_iter.by_ref() {
let cmd_bytes = explorer.nodes[cmd_idx].bytes;
if batched.len() + cmd_bytes.len() > CMD_BUFFER_SIZE {
write!(
serial,
"[E] Batch buffer overflow (need {} bytes)\r\n",
batched.len() + cmd_bytes.len()
)
.ok();
return Err(ExplorerError::BufferOverflow);
}
batched
.extend_from_slice(cmd_bytes)
.map_err(|_| ExplorerError::BufferOverflow)?;
}
};

let mut addrs_to_remove: heapless::Vec<usize, I2C_MAX_DEVICES> = heapless::Vec::new();

for (addr_idx, &addr) in target_addrs.iter().enumerate() {
util::prevent_garbled(serial, format_args!("Sending commands to {addr:02X}"));
if sort_iter.is_cycle_detected() {
write!(serial, "[E] Dependency cycle detected. Aborting.\r\n").ok();
return Err(ExplorerError::DependencyCycle);
}

let mut all_ok = true;
writeln!(serial, "I2C WRITE @ {addr:02X}:").ok();
for b in batched.iter() {
write!(serial, "{b:02X} ").ok();
}
writeln!(serial).ok();

for i in 0..explorer.nodes.len() {
if failed_nodes[i] {
continue;
match i2c.write(addr, &batched) {
Ok(_) => {
write!(serial, "[E] OK batched ({} bytes)\r\n", batched.len()).ok();
}
let cmd_bytes = &sequence_bytes[i];

if exec_log_cmd(i2c, &mut executor, serial, addr, cmd_bytes, i).is_err() {
util::prevent_garbled(
Err(e) => {
write!(
serial,
format_args!("[warn] Command {i} failed on {addr:02X}"),
);
all_ok = false;
if i >= successful_seq_len {
failed_nodes[i] = true;
}
break;
"[E] FAIL batched: {}\r\n",
crate::compat::HalErrorExt::to_compat(&e, Some(addr))
)
.ok();
}
}

if all_ok {
let is_cycle_detected = sort_iter.is_cycle_detected();

if is_cycle_detected {
write!(serial, "[E] Dependency cycle detected. Aborting.\r\n").ok();
return Err(ExplorerError::DependencyCycle);
}

if let Some(cmd_idx) = command_to_fail {
failed_nodes.set(cmd_idx).unwrap();
} else {
addrs_to_remove.push(addr_idx).ok();
}
}

for &idx in addrs_to_remove.iter().rev() {
target_addrs.swap_remove(idx);
}

if target_addrs.is_empty() || failed_nodes.iter().all(|&x| x) {
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The pruning logic in pruning_explorer seems to be missing in this refactoring. The command_to_fail variable is always None, which means the else branch is always taken, removing the I2C address from the target list after a single attempt, regardless of whether the batched write succeeded or failed. This means no actual pruning of failed command nodes occurs. The function's name is now misleading, and the intended logic of retrying with a pruned command set seems to be lost.

Comment on lines 158 to 160
pub const fn calculate_cmd_buffer_size(num_commands: usize, max_cmd_len: usize) -> usize {
num_commands * (max_cmd_len + 1) + num_commands * 2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function calculate_cmd_buffer_size is marked as deprecated by being moved to the bottom of the file and appears to be unused in the new implementation. It's good practice to remove dead code to improve maintainability.

Comment on lines 63 to 67
fn write(&mut self, buf: &[u8]) -> Result<usize, Self::Error> {
self.0.write(buf)?;
for &b in buf {
self.0.write(&[b])?;
}
Ok(buf.len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The write implementation for the adapt_serial! wrapper now writes byte-by-byte in a loop. The underlying SerialCompat::write trait accepts a slice &[u8], so writing the entire buffer at once is more efficient. This change introduces a performance regression.

                self.0.write(buf)?;
                Ok(buf.len())

Comment on lines 78 to 83
fn write_str(&mut self, s: &str) -> core::fmt::Result {
self.0.write(s.as_bytes()).map_err(|_| core::fmt::Error)?;
for &b in s.as_bytes() {
self.0.write(&[b]).map_err(|_| core::fmt::Error)?;
}
self.0.flush().map_err(|_| core::fmt::Error)?;
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the write implementation, write_str now writes byte-by-byte, which is inefficient. It's better to write the entire string's byte slice at once and then flush.

                self.0.write(s.as_bytes()).map_err(|_| core::fmt::Error)?;
                self.0.flush().map_err(|_| core::fmt::Error)?;
                Ok(())

p14c31355 and others added 6 commits September 3, 2025 15:49
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@p14c31355
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and valuable refactoring. It replaces the complex and inefficient permutation-based command exploration with a much cleaner and faster topological sort approach. The fix for garbled serial output using an AsciiSafeWriter is robust, and the error handling and logging infrastructure have been greatly improved. I've identified a few issues, including a critical one related to a recursive error type and a high-severity issue with the new pruning logic, that should be addressed. Overall, this is excellent work.

BitFlagsError(BitFlagsError),
BitFlags(BitFlagsError),
/// An error occurred in the explorer module.
Explorer(ExplorerError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ExecutorError enum contains an ExplorerError variant by value. This creates a recursive type definition, as ExplorerError can be created from an ExecutorError via the From trait implementation. This will cause a compile error because the type has an infinite size. To fix this, you should use a pointer type like Box<ExplorerError>. Since this is a no_std crate, you may need to enable the alloc feature and use Box from the alloc crate if it's not already available.

Comment on lines +99 to +102
write!(serial, "[W] Failed batched @ {addr:02X}, pruning nodes\r\n").ok();
for cmd_idx in 0..explorer.nodes.len() {
failed_nodes.set(cmd_idx).ok();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The failure handling logic for the batched write appears to be incorrect. When a batched write fails for a single address, the code marks all possible commands as failed. This set of failed nodes is then merged into global_failed_nodes. As a result, a single failure on one address will prevent any commands from being tested on all subsequent addresses, which is likely not the intended behavior.

Since the batched approach loses the granularity to know which specific command failed, you might want to reconsider this pruning strategy. A possible fix is to not update global_failed_nodes at all when a batched write fails, and just treat it as a failure for that specific address.

Suggested change
write!(serial, "[W] Failed batched @ {addr:02X}, pruning nodes\r\n").ok();
for cmd_idx in 0..explorer.nodes.len() {
failed_nodes.set(cmd_idx).ok();
}
write!(serial, "[W] Failed batched @ {addr:02X}, cannot determine failing command.\r\n").ok();

}

/// Errors that can occur during the exploration of command sequences.
#[derive(PartialEq, Eq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Debug derive is missing for ExplorerError. It's good practice for error types to be debug-printable to aid in debugging.

Suggested change
#[derive(PartialEq, Eq)]
#[derive(Debug, PartialEq, Eq)]


/// Errors that can occur during command execution.
#[derive(Debug, PartialEq, Eq)]
#[derive(PartialEq, Eq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Debug derive is missing for ExecutorError. It's good practice for error types to be debug-printable to aid in debugging.

Suggested change
#[derive(PartialEq, Eq)]
#[derive(PartialEq, Eq, Debug)]

@p14c31355
Copy link
Owner Author

/gemini summary

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request primarily focuses on resolving critical memory consumption challenges, particularly on resource-constrained embedded systems like the Arduino Uno, by fundamentally refactoring the application's command exploration and topological sorting mechanisms. It transitions from memory-intensive fixed-size arrays and dense graph representations to an efficient, iterator-based, and sparse adjacency list approach. This optimization, alongside a fix for garbled serial output and improvements to I2C communication, aims to drastically reduce SRAM usage while preserving core functionality and robust error handling.

Highlights

  • SRAM Optimization for Topological Sorting: Implemented significant memory optimizations for topological sorting algorithms, transitioning from memory-intensive fixed-size arrays and dense bitmask representations to an efficient, iterator-based, and sparse adjacency list approach. This drastically reduces SRAM consumption, crucial for resource-constrained embedded systems like the Arduino Uno.
  • Improved I2C Compatibility and Scanning: Enhanced I2C interaction by introducing probe and is_nack methods to the I2cCompat trait, allowing for more robust device detection and error handling. The I2C scanning mechanism was also refined for better performance and clarity.
  • Enhanced Serial Output and Debugging: Addressed a 'garbled characters' issue in serial output by refining hex formatting and increasing the UART buffer size. Debugging capabilities were improved with the reintroduction of detailed logging for node dependencies and command execution.
  • Robust Error Handling and Generics: Ensured the preservation and restoration of original error handling, including explicit cycle detection in topological sorts. The system's generic parameter handling was made more robust through the introduction of an ExplorerTrait for cleaner type inference and better modularity.
Changelog
  • src/compat/i2c_compat.rs
    • Added embedded_hal_1 error types for ehal_1_0 feature.
    • Introduced probe and is_nack methods to I2cCompat trait for device presence checking and NACK error identification.
    • Implemented probe and is_nack for embedded_hal_0_2 and embedded_hal_1 compatibility layers.
  • src/compat/util.rs
    • Removed old hex formatting and string utility functions.
    • Added is_empty, clear_all, BitOrAssign implementations for BitFlags.
    • Introduced new hex formatting utilities (nibble_to_hex, write_byte_hex_fmt, write_bytes_hex_fmt, write_bytes_hex_prefixed_fmt).
    • Added new string and character utilities for ASCII safety (write_str_byte, AsciiSafeWriter, write_formatted_ascii_safe).
    • Removed prevent_garbled function.
  • src/error.rs
    • Added fmt::Display implementations for all specific error enums (UartError, I2cError, SpiError, GpioError, AdcError, HardwareError, BufferError).
    • Updated ErrorKind's Display to use specific error Display implementations.
    • Added BitFlagsError enum and its Display implementation.
    • Updated ExplorerError and ExecutorError to use new error variants and added Display implementations.
    • Added From<ExecutorError> for ExplorerError and From<ErrorKind> for ExplorerError.
  • src/explore/explorer.rs
    • Introduced TopologicalIter struct for iterator-based Kahn's algorithm, replacing previous full-graph materialization.
    • Replaced u128 bitmask for adj_list_rev with a flat array and offset system to support sparse dependency graphs and reduce SRAM.
    • Refactored PrefixExecutor's write_with_retry to include logging.
    • Refactored exec_log_cmd to use write! instead of prevent_garbled.
    • Updated PrefixExecutor::exec to use core::fmt::Write::write_str and crate::compat::util::write_bytes_hex_fmt for logging.
    • Updated nodes! macro to calculate MAX_CMD_LEN_INTERNAL and MAX_DEPS_TOTAL_INTERNAL as const generics.
    • Refactored Explorer struct to take MAX_DEPS_TOTAL generic parameter and added topological_iter method.
    • Removed explore, get_one_sort, and PermutationIter related code.
  • src/explore/runner.rs
    • Removed factorial_sort! macro and factorial_explorer function.
    • Refactored pruning_sort! macro to pass MAX_DEPS.
    • Refactored pruning_explorer to utilize the new TopologicalIter for incremental command execution and improved error handling.
    • Refactored get_one_sort! macro to pass MAX_DEPS.
    • Refactored one_topological_explorer to use TopologicalIter and updated logging.
    • Adjusted logging to use direct write! calls for efficiency and consistency.
  • src/macros.rs
    • Modified adapt_serial! to write bytes individually in write and write_str for improved robustness against garbled output.
    • Corrected error formatting in quick_diag macro.
  • src/prelude.rs
    • Updated pub use statements to reflect changes in exported macros and modules, including nodes.
    • Updated compat::util imports to new functions and error type imports.
    • Updated explore::explorer imports.
  • src/scanner.rs
    • Removed util import.
    • Refactored internal_scan to leverage the new I2cCompat::probe method for more efficient and robust I2C device discovery.
    • Streamlined scan_i2c and scan_init_sequence functions, integrating logic directly and improving logging output.
  • tests/integration.rs
    • Added probe and is_nack methods to DummyI2c for I2cCompat trait implementation.
Activity
  • p14c31355 reported SRAM limit issues on Arduino Uno and a 'garbled characters' problem.
  • gemini-code-assist analyzed the SRAM issue, suggesting CmdNode definition changes and a heapless::String buffer reduction.
  • p14c31355 provided a detailed analysis of SRAM consumption in topological sort functions and proposed refactoring strategies including on-demand sorting and iterators.
  • gemini-code-assist implemented the proposed refactorings, introducing an iterator-based approach for topological sorting.
  • p14c31355 pointed out that the refactoring led to a significant loss of original code, particularly improper error handling and debugging output.
  • gemini-code-assist apologized and provided an updated refactoring, restoring original error handling and debugging features.
  • p14c31355 inquired about further SRAM saving methods.
  • gemini-code-assist suggested implementing a sparse adjacency list to optimize adj_list_rev.
  • p14c31355 requested the implementation of the sparse adjacency list.
  • gemini-code-assist implemented the sparse adjacency list using heapless::Vec and introduced MAX_DEPS_PER_NODE as a generic parameter.
  • p14c31355 reported that the code was still full of errors.
  • gemini-code-assist apologized and provided a thoroughly corrected version, addressing in_degree calculation, generic parameter handling, and preserving original functionality.
  • p14c31355 reported a new issue: garbled serial output when passing values to another function.
  • gemini-code-assist provided a detailed bug analysis and fixes for several identified issues, including the in_degree calculation, permutation logic, exec_log_cmd qualification, pruning_explorer infinite loop, and scan_init_sequence performance.
  • gemini-code-assist provided further bug analysis and fixes, focusing on ExplorerTrait generics, PrefixExecutor logic, and ExploreResult population.

@p14c31355 p14c31355 merged commit d1d568a into develop Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments