Skip to content

Conversation

@georgweiss
Copy link
Collaborator

This PR adds a web sockets-based mechanism for save&restore whereby data changes on the service side are pushed as web socket messages to all connected clients.

On the Phoebus app side, the core-websocket module adds a web socket client using the native Java APIs. Clients of the WebSocketClient only need to specify a URI and a callback for receiving text messages. Optionally API clients may register callbacks to handle/debug connection and disconnection events. WebSocketClient should be generic enough to support multiple use cases, e.g. logbook or alarm logger UI.
WebSocketClient will by default try to reconnect to the remote service in case the remote peer is shut down.

The save&restore app makes use of the WebSocketClient to update the UI based on whatever the service pushes as web socket messages. This way all clients should be able to reflect changes done by all users.

Copy link
Collaborator

@abrahamwolk abrahamwolk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions:

  1. Suppose the connection is lost between the client and the server, and suppose that changes were made to a snapshot before the connection was re-established. Will the client be notified of the changes that were made while the connection was lost?

  2. How is the situation dealt with when a snapshot is simultaneously being worked on by two clients in possibly non-compatible ways? For instance, one client may remove a folder, while another client may simultaneously add a node to the folder, or rename the folder in question.

List<String> selectedNodeIds =
((List<Node>) selectedNodes).stream().map(Node::getUniqueId).collect(Collectors.toList());
JobManager.schedule("copy nodes", monitor -> {
JobManager.schedule("Copy odes", monitor -> {
Copy link
Collaborator

@abrahamwolk abrahamwolk May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the first argument to JobManager.schedule() be "Copy nodes"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

* @param snapshotNode An existing {@link Node} of type {@link NodeType#SNAPSHOT}
*/
public void loadSnapshot(Node snapshotNode) {
public synchronized void loadSnapshot(Node snapshotNode) {
Copy link
Collaborator

@abrahamwolk abrahamwolk May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is synchronized, but the call to loadSnapshotInternal() will most likely return before the job scheduled by JobManager.schedule() has run, which in turn most likely will return before the function submitted to Platform.runLater() has run. Is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, need not be synchronized. Leftover from testing.

tabGraphicImageProperty.set(ImageRepository.SNAPSHOT);
}
}
//WebSocketClientService.getInstance().addWebSocketMessageHandler(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix


private void loadConfigurationData(Runnable completion) {
UI_EXECUTOR.execute(() -> {
public synchronized void loadConfiguration(final Node node) {
Copy link
Collaborator

@abrahamwolk abrahamwolk May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is synchronized, but the functions submitted to JobManager.schedule() and Platform.runLater() are run asynchronously and will most likely return after this method has returned. Is this correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, need not be synchronized. Leftover from testing.

@georgweiss
Copy link
Collaborator Author

I have two questions.

  1. You have a point, will add update of UI when web socket is re-established. That said, in most cases the web socket would go away as a consequence of the service being shutdown or restarted. When service is offline no users can save changes.
  2. This is intentionally not considered in any particular manner. Currently simultaneous edits are not handled in any particular manner, but with updates triggered by the web socket messages users would at least know that an object has been updated (and by whom!). In my view disallowing simultaneous edits would be a non-trivial, though interesting, challenge.

@shroffk
Copy link
Member

shroffk commented May 9, 2025

regarding 2.
SAR will most likely not be a high throughput system where many people are editing and modifying the tree or nodes simultaneously... I fell like the complexity of adding that level of locking for consistent editing would be solving a problem that does not exist.
I think it would be better to inform users that the SAR does not have edit sessions and the last edit is what you see.

@shroffk
Copy link
Member

shroffk commented May 9, 2025

When service is offline no users can save changes.

+1
refreshing UI on reconnect makes sense

@abrahamwolk
Copy link
Collaborator

Incompatible updates can still happen even though updates to snapshots are relatively infrequent. E.g., a computer may be left unattended with unsaved changes for some amount of time, or a client with unsaved changes may be disconnected for some amount of time because a computer was suspended.

One idea, that does not use locks, could be that each revision of a snapshot is given its own unique ID (perhaps implemented using a counter), and on writing, the unique ID of the revision that is overridden is compared on the server against the unique ID of the revision that the edit is based on. If they don't match, then the snapshot has been updated while editing took place, and an error or warning can be displayed.

@georgweiss
Copy link
Collaborator Author

Save & restore has been in use for quite some time without anything preventing simultaneous edits. Further, currently there is no way to refresh data other than collapsing/expanding nodes in the tree view. On top of that, an object being edited must be closed and reopened to reflect changes.

What web sockets add is a way to make sure the UI reflects changes made by others. Safeguarding against simultaneous edits is in my view not in scope for this PR and the introduction of web sockets.

@abrahamwolk
Copy link
Collaborator

What will user A see if user A is editing a snapshot, and user B writes to the snapshot?

@georgweiss
Copy link
Collaborator Author

Then user A will see the state saved by B and A's edits are lost. B's identity will be apparent from the updated UI.

@Override
public boolean handleTabClosed() {
saveLocalState();
//saveLocalState();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out line of code.


private void handleWebSocketConnected() {
serviceConnected.setValue(true);
Platform.runLater(() -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call Platform.runLater() with a function that does nothing? Both here and in handleWebSocketDisconnected().

@georgweiss
Copy link
Collaborator Author

Interesting observation (at least on MacOS): disabling WiFi does not trigger a web socket disconnect event. Moreover, when WiFi is enabled again, the web socket remains operational.

In any case, I have added a refresh of the UI when service is brought back on-line and client succeeds to reconnect.

@georgweiss
Copy link
Collaborator Author

Based on observations in various test scenarios: in order to handle different kind of web socket connection issues, a ping/pong strategy is needed. Therefore the client will dispatch a ping message and consider the connection dead if a pong message is not received within three seconds.

@shroffk shroffk merged commit 286eee1 into master May 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants