Skip to content

Sync fork: v0.2.5 + fixes (trajectory shake, gripper stall, TCP offset, async cleanup)#15

Merged
Jepson2k merged 20 commits into
PCrnjak:mainfrom
Jepson2k:main
May 14, 2026
Merged

Sync fork: v0.2.5 + fixes (trajectory shake, gripper stall, TCP offset, async cleanup)#15
Jepson2k merged 20 commits into
PCrnjak:mainfrom
Jepson2k:main

Conversation

@Jepson2k
Copy link
Copy Markdown
Collaborator

Summary

Brings the Jepson2k fork's main up to upstream. The fork has been the active
development branch for the waldoctl + pinokin Python API since the v0.2.0
rewrite landed, and has accumulated three release tags (0.2.3, 0.2.4, 0.2.5)
worth of fixes and improvements. The fork is strictly ahead of upstream — no
upstream-only commits since the merge base — so this is a fast-forward merge.

Bug fixes

  • Electric-gripper stall detection (3 commits). Gripper actions weren't
    terminating when the gripper stalled on an object, leaving the motion queue
    blocked; PID hunting on the stalled jaws confused naive stall detection;
    move_active was getting cleared before subsequent commands could see it.
    All three rolled together so grasps complete normally on real hardware.
  • SetTcpOffset routing: TCP offset was being applied only in the
    controller process; the motion planner subprocess saw the old offset and
    produced IK seeds against the wrong TCP. Route through the planner.
  • Tool-action error recovery: failed gripper commands left stale error
    state and an uninitialised command_index, blocking subsequent commands.
  • RobotClient.async_client property removed: the sync RobotClient
    exposed an .async_client property that encouraged mixing sync + async
    control on different event loops, with race conditions and "no running
    event loop" surprises. Async users construct AsyncRobotClient directly.
    Minor API break.

Kinematics / motion

  • Trajectory shake fix: TOPP-RA trajectories jittered on wrist-near-singular
    configs because an isolated bad IK sample propagated into the rate-limited
    interpolation. Added an njit'd outlier smoother that median-filters the IK
    output, denser IK sampling along the path, and a sim-dt fix so the simulator
    isn't subject to host scheduling noise.
  • IK max_iter 10 → 20: helps wrist-near-singular targets converge without
    falling back to a different seed.
  • Deterministic wrist-flip restart for IK branches that converge outside
    joint limits.

Dependencies

  • Bumped pinokin pin from v0.1.5 → v0.1.6 (picks up the closest-solution
    unwrap and the null-space optimization port from robotics-toolbox-python).

Examples / docs

  • Fixed the set_tcp_offset axis in the pencil example (wrong axis after the
    TCP-frame convention firmed up).
  • Minor cleanup in CLAUDE.md.

Test plan

  • CI green on all commits in this branch (already running on the fork)
  • On-hardware: home → moveJ → moveL → moveP cycle with electric gripper
  • Electric gripper grasp + release cycle with object in jaws (stall path)

PR description written by Claude on the author's behalf.

Jepson2k and others added 20 commits April 16, 2026 16:51
Tool actions were always throwing timeout errors because the error state
from a previous command wasn't cleared before dispatching a ToolActionCmd,
and tool-command failures didn't record the command_index — so clients
couldn't correlate the error with the offending command.
SSG-48 tool transform is along Z, but the precision demos were calling
set_tcp_offset(-100, 0, 0) — shifting the TCP sideways instead of down
the pencil axis. Result: TRF rotations pivoted near the gripper rather
than the pencil tip. Use (0, 0, -100) to match the tool frame.
The existing object-detection path requires firmware to set the
detection bit for 5 consecutive ticks. In practice the bit is too
flaky for the debouncer to latch when grasping a thin object, so
tool.close() would run out the 10s timeout instead of returning
done as soon as the gripper stops moving.

Adds a position-stall backstop: after a 100ms grace period, if
feedback_position stays within 2/255 for 200ms while move_active
is set, the gripper is considered blocked and the command finishes
cleanly. Object-detection fast-path is unchanged.
The previous baseline-position check could never fire when the gripper
hunted around its stall point — any oscillation > 2 units kept resetting
the baseline and the counter, so close() still timed out on real grasps.

Switch to tracking the smallest distance-to-target seen so far. If
distance hasn't improved by more than the dead-band for the stall window,
declare stall. Oscillation around a stalled point doesn't beat the best
distance, so the counter drains correctly.
When the no-progress path fired it was clearing move_active, which tells
the firmware to release motor force — so grasped items slipped out after
the command reported done. Match the object-detection completion path:
leave move_active on so the firmware keeps applying force on the object.

Also add a regression test that runs the stall-completion path with a
held feedback_position and asserts the MOVE_ACTIVE bit is still set.
SetTcpOffsetCommand was a SystemCommand, which executes in the controller
process and bypasses the planner subprocess. The planner's own robot
model never learned about the offset, so subsequent trajectory IK ran
against the stale TCP — TRF rotations pivoted at the flange instead of
at the offset point.

Switch to MotionCommand so the command flows through _planner.submit()
like SelectTool, where _handle_inline already updates planner state and
the planner's PAROL6_ROBOT.robot. The controller-side segment player
still runs execute_step() after the planner emits the InlineSegment, so
the controller's own state is updated too.

Regression tests: structural check that SetTcpOffsetCommand is a
MotionCommand (not SystemCommand), and behavioral check that
TrajectoryPlanner.process() updates planner.state.tcp_offset_m.
Pinokin's wrist-flip restart strategy (introduced in pinokin
fix-ik-wrist-flip-restart) only triggers after LM converges to a
limits-violating q. With max_iter=10 some realistic seeds (e.g. the
PRECISION_POSE call from end-of-zigzag in demo_showcase line 98) need
~11 LM iters to converge — they exited the inner loop without ever
calling check_limits, so the wrist flip never applied.

20 gives enough margin for current PAROL6 IK seeds while keeping
worst-case solve time well under 1 ms. Successful first-attempt
convergence is unaffected (LM converges in <20 iters either way).
v0.1.6 ships the deterministic wrist-flip restart for limits-violating
IK branches (Jepson2k/pinokin#1). Combined with this branch's max_iter
bump to 20, the wrist-flip restart now actually fires for the realistic
PAROL6 seeds it was designed to rescue.
bump IK max_iter to 20 to enable wrist-flip restart in pinokin
Removes two CLAUDE.md sections:
- The 'No emoji / no Generated by... / no co-author boilerplate' rule
  under Code Style
- The 'Do NOT use compound cd commands' Shell Command Style section
docs: drop git-commit and shell-style rules from CLAUDE.md
…samples

Three independent fixes that together eliminate visible robot shake during
the precision sweep section of the demo:

1. parol6/motion/trajectory.py: detect and linearly-interp IK-chain outlier
   runs from pinokin's LM branch hops at wrist singularities (J5 near 0).
   Both single-sample and multi-sample outliers occur; bookend/path-length
   ratio < 0.5 reliably distinguishes 'chain doubles back' (outlier) from
   legitimate fast joint motion (e.g. opening sweep from a singular pose).
   Padding spreads patched joint motion across more samples so TOPP-RA
   doesn't have to crank up local joint velocity.

2. parol6/server/transports/mock_serial_transport.py: pass cfg.INTERVAL_S
   as dt instead of wallclock dt. Wallclock dt inherits the host scheduler's
   jitter, causing the simulator to under-/over-advance Position_in around
   slow ticks — visible as ghost velocity spikes in recordings even when
   the commanded trajectory is bit-identical across runs.

3. parol6/config.py: bump PATH_SAMPLES default 50 -> 200. Denser cartesian
   sampling reduces per-sample joint-norm spread and lets the IK outlier
   detector be more selective.

4. examples/precision.py: synced to match demo_showcase.py's precision
   section (was out of date).

The shake symptom was a real-position discontinuity of ~4 deg per 20ms
status frame on J4/J6 during RX/RY/RZ wrist sweeps — caught by recording
the multicast status while precision.py ran in WC and plotting velocity.
Root cause was LM IK picking off-branch solutions at the singular pose;
the smoother repairs the chain in joint space while preserving the
J4+J6 invariant exactly (sub-mm FK error).
The relative-threshold smoother (introduced in 209b6d8) was a pure-Python
function with per-sample broadcasts that allocated on every patched index.
Promote it to @njit(cache=True) with explicit scalar loops, leaving one
np.empty(n-1) scratch for the step magnitudes and whatever np.median does
internally — both bounded by path length, called once per plan.

Add a synthetic-outlier warmup call so the JIT compile happens at server
startup rather than first plan, and trim the docstring/constant comments
to just the why.

Cover the smoother with regression tests: smooth-path pass-through, single
and multi-sample hops, start/end padding clamping, short-path/zero-motion
no-ops, sub-threshold and at-threshold steps, plus a randomized fuzz that
verifies endpoint invariance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix trajectory shake: smooth IK outliers, fix sim dt jitter
…-gun

The property returned self._inner — the AsyncRobotClient bound to
RobotClient's private background loop. Reaching for it from another loop
and awaiting one of its methods produced a cryptic "Queue is bound to a
different event loop" deep inside _request_ok_raw.

Worse, _run()'s error message steered users toward the trap: "use
AsyncRobotClient and `await` the method instead" got read as "grab the
one hanging off my sync client" rather than "construct a fresh one in
this loop."

- Delete the async_client property (zero non-test callers across this
  repo and Waldo-Commander; re-added in 8cc47dc as part of a typed-stub
  revert).
- Tighten _run()'s error message to point at constructing a fresh
  AsyncRobotClient in the calling loop.
- Update the two test sites that monkeypatched client.async_client._request
  to use client._inner._request directly — they were already patching a
  private method, so the change is honest.
remove RobotClient.async_client property to eliminate cross-loop foot-gun
@Jepson2k Jepson2k merged commit 358da7b into PCrnjak:main May 14, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant