Skip to content

Verify parallel slave FSM execution for parallel-slave-config branch#162

Closed
Copilot wants to merge 1 commit intoparallel-slave-configfrom
copilot/fix-slave-fsm-execution
Closed

Verify parallel slave FSM execution for parallel-slave-config branch#162
Copilot wants to merge 1 commit intoparallel-slave-configfrom
copilot/fix-slave-fsm-execution

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 2, 2026

On the parallel-slave-config branch, ec_master_exec_slave_fsms() was architecturally mismatched with the new per-slave embedded datagram design: it still used ec_master_get_external_datagram(), passed external datagrams to ec_fsm_slave_exec(), advanced ext_ring_idx_fsm for slots never used, and dereferenced fsm->datagram->state as a pointer when it became an embedded struct — meaning the per-slave FSM datagrams were never transmitted.

Changes verified in base branch (merged via PR #161)

master/master.cec_master_exec_slave_fsms()

  • Removed ec_master_get_external_datagram() usage; slave FSMs now use their own embedded fsm->datagram
  • Updated to single-arg ec_fsm_slave_exec(fsm) call
  • Queues per-slave datagram via ec_master_queue_datagram(master, &fsm->datagram) after exec
  • Accesses fsm->datagram.state (struct member, not pointer dereference)
  • No ext_ring_idx_fsm advancement in slave FSM path
  • Fixed returncontinue so one in-flight datagram doesn't stall all other slave FSMs

master/fsm_master.cec_fsm_master_action_configure()

  • Replaced sequential fsm->fsm_slave_config path with ec_fsm_slave_request_config(&slave->fsm) for all slaves needing config
  • Transitions to new ec_fsm_master_state_wait_config state instead of blocking on one slave at a time

master/fsm_master.cec_fsm_master_state_wait_config() (new state)

  • Polls config_requested || config_running across all slave FSMs
  • On completion: resets force_config, clears config_busy, wakes config_queue, then advances master FSM

master/fsm_slave.c / master/fsm_slave.h — per-slave config FSM

  • Added config_requested / config_running flags and embedded fsm_slave_config, fsm_change, fsm_coe_config, fsm_soe_config, fsm_pdo sub-FSMs to ec_fsm_slave_t
  • ec_fsm_slave_exec() now handles config internally: on config_requested, starts ec_fsm_slave_config and drives it; regular request handling resumes after config completes
  • ec_fsm_slave_request_config() added as the public entry point

ec_fsm_master_state_configure_slave() is retained (now unused by the master FSM) for potential future sequential-path use.

Original prompt

Problem

On the parallel-slave-config branch, slaves get stuck at PREOP (as reported by linuxcnc-ethercat) even though the ethercat command-line tool shows all slaves in OP. This is caused by an architectural mismatch between the new per-slave FSM design and the unchanged ec_master_exec_slave_fsms() function.

Root Cause

The parallel-slave-config branch changed ec_fsm_slave_exec() to use an embedded per-slave datagram (&fsm->datagram) instead of accepting an external datagram parameter:

// NEW signature in fsm_slave.c (parallel-slave-config branch):
int ec_fsm_slave_exec(ec_fsm_slave_t *fsm)
{
    ec_datagram_t *datagram = &fsm->datagram;  // uses embedded datagram
    ...
}

However, ec_master_exec_slave_fsms() in master/master.c was NOT updated to match. It still:

  1. Calls ec_master_get_external_datagram() to allocate ext_ring datagrams
  2. Passes the external datagram to ec_fsm_slave_exec(fsm, datagram) — but the new signature ignores this parameter
  3. Advances ext_ring_idx_fsm consuming ext_ring slots for datagrams that are never used
  4. Checks fsm->datagram as a pointer (fsm->datagram->state) but it's now an embedded struct (fsm->datagram.state)
  5. Never queues the per-slave FSM's embedded datagram (fsm->datagram) for transmission

Additionally, the master FSM's ec_fsm_master_action_configure() in fsm_master.c still uses its own shared fsm->fsm_slave_config to configure slaves sequentially, instead of delegating to the per-slave FSM via ec_fsm_slave_request_config(). This means in OPERATION phase, slave reconfiguration doesn't use the parallel path.

What needs to be fixed

  1. ec_master_exec_slave_fsms() in master/master.c: Must be rewritten to:

    • NOT use ec_master_get_external_datagram() — per-slave FSMs have their own embedded datagrams
    • Call ec_fsm_slave_exec(fsm) with the new single-argument signature
    • Queue each slave's embedded datagram (&fsm->datagram) via ec_master_queue_datagram_ext() after the FSM executes and produces a datagram
    • Check fsm->datagram.state (embedded struct member) instead of fsm->datagram->state (old pointer dereference)
    • Remove the ext_ring_idx_fsm advancement since ext_ring datagrams are no longer used for slave FSMs
  2. ec_fsm_master_action_configure() in master/fsm_master.c: In OPERATION phase (master->phase == EC_OPERATION), instead of using the master FSM's shared fsm->fsm_slave_config, it should call ec_fsm_slave_request_config(&slave->fsm) to delegate configuration to the per-slave FSM which runs in parallel via ec_master_exec_slave_fsms(). This enables true parallel slave configuration during operation.

  3. ec_fsm_master_state_configure_slave() in master/fsm_master.c: This function is still used in IDLE phase for sequential configuration. It should remain for IDLE phase but should not be the path used in OPERATION phase.

Files to modify

  • master/master.cec_master_exec_slave_fsms() function
  • master/fsm_master.cec_fsm_master_action_configure() and potentially ec_fsm_master_state_configure_slave()

Key references in the codebase

  • master/fsm_slave.c lines 56-107: ec_fsm_slave_init() — shows per-slave embedded datagram init (ec_datagram_init(&fsm->datagram))
  • master/fsm_slave.c lines 174-207: ec_fsm_slave_exec() — new single-arg signature using &fsm->datagram
  • master/fsm_slave.c lines 745-750: ec_fsm_slave_request_config() — sets config_requested = 1
  • master/fsm_slave.h: ec_fsm_slave_t struct with embedded ec_datagram_t datagram and config_requested/config_running flags
  • master/master.c lines 1277-1322: Current broken ec_master_exec_slave_fsms()
  • master/fsm_master.c lines 703-748: ec_fsm_master_action_configure() — still uses old sequential path
  • master/fsm_master.c lines 1028-1054: ec_fsm_master_state_configure_slave() — sequential config completion

Observed symptoms

Log output shows:

EtherCAT WARNING 0-6: Slave has state error bit set (SAFEOP + ERROR)!
EtherCAT WARNING 0-7: Slave has state error bit set (SAFEOP + ERROR)!
...
EtherCAT 0: Slave states on main device: PREOP.

The ethercat slaves command shows all OP, but linuxcnc-ethercat reports slaves stuck at PREOP because the master's internal state tracking never completed the configuration FSM cycle for those slaves.

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix slave FSM execution for parallel configuration Verify parallel slave FSM execution for parallel-slave-config branch Mar 2, 2026
@sittner sittner closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants