Skip to content

Conversation

@shroffk
Copy link
Member

@shroffk shroffk commented Apr 1, 2025

I have used an existing setting/preference

    /** Connection timeout [seconds]
     *
     * <p>When approaching this time without having received a new value,
     * an 'Echo' request is sent. If still no reply, the channel is disconnected.
     */
    public static int EPICS_PVA_CONN_TMO = 30;

I had to change to code to first create a unconnected socket and then try connect with a timeout. Previously we were using constructors which both created and connected to the address... but these constructors did not support configuring a timeout.

A @kasemir through review would be appreciated. @abrahamwolk can you test this branch with a reduced timeout to see what impact it has on your Phoebus behaviour.

@shroffk shroffk requested review from abrahamwolk and kasemir April 1, 2025 14:42
@kasemir
Copy link
Collaborator

kasemir commented Apr 1, 2025

That looks good except for re-using EPICS_PVA_CONN_TMO.

Is used on established TCP connections to verify if they're still alive.
If there was no traffic for 30 seconds, we send an 'echo' message, expecting a reply.
That should only happen every 30 seconds.

For this new connection timeout, we want a much shorter delay of 2..5 seconds.
So need a new variable. The existing name EPICS_PVA_CONN_TMO is unfortunate, because that really sounds like the new connection timeout, and the other should have been called EPICS_PVA_CONNECTION_VERIFICATION_PERIOD or EPICS_PVA_ECHO_CHECK_PERIOD or the like, but PVXS also uses EPICS_PVA_CONN_TMO, so too late to change that now.
In 10 years, when you replace EPICS with something else, you can pick better names from the start.

@shroffk
Copy link
Member Author

shroffk commented Apr 1, 2025

That looks good except for re-using EPICS_PVA_CONN_TMO.

How about:

EPICS_PVA_TCP_SOCKET_TMO
phoebus preference
epics_pva_tcp_socket_tmo

The existing name EPICS_PVA_CONN_TMO is unfortunate, because that really sounds like the new connection timeout,

it is unfortunate indeed

Copy link
Collaborator

@abrahamwolk abrahamwolk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the option

org.phoebus.pv.pva/epics_pva_tcp_socket_tmo=2

in my phoebus.ini file does not have an effect. The reason is that a line needs to be added to the static initializer block of the class PVASettings:

diff --git a/core/pva/src/main/java/org/epics/pva/PVASettings.java b/core/pva/src/main/java/org/epics/pva/PVASettings.java
index eda3b03bc..4b2fd574e 100644
--- a/core/pva/src/main/java/org/epics/pva/PVASettings.java
+++ b/core/pva/src/main/java/org/epics/pva/PVASettings.java
@@ -279,6 +279,7 @@ public class PVASettings
         EPICS_PVA_FAST_BEACON_MAX = get("EPICS_PVA_FAST_BEACON_MAX", EPICS_PVA_FAST_BEACON_MAX);
         EPICS_PVA_MAX_BEACON_AGE = get("EPICS_PVA_MAX_BEACON_AGE", EPICS_PVA_MAX_BEACON_AGE);
         EPICS_PVA_ENABLE_IPV6 = get("EPICS_PVA_ENABLE_IPV6", EPICS_PVA_ENABLE_IPV6);
+        EPICS_PVA_TCP_SOCKET_TMO = get("EPICS_PVA_TCP_SOCKET_TMO", EPICS_PVA_TCP_SOCKET_TMO);
     }
 
     /** Get setting from property, environment or default

@abrahamwolk
Copy link
Collaborator

@abrahamwolk can you test this branch with a reduced timeout to see what impact it has on your Phoebus behaviour.

I have now tested it, using the following setup:

  • I have four local soft IOCs running, that I connect to using an OPI.
  • For one of the soft IOCs, I block the TCP port using the firewall.
  • I have set the new option EPICS_PVA_TCP_SOCKET_TMO to 2.
  • I try to measure the time for establishment of connection to all PVs after the OPI has been loaded (meaning it is displayed on the screen). I do this using a stop-watch manually, and the precision of the timings is therefore perhaps in the region of +/- 1 second.

What I observe is the following:

  • When first loading the OPI, it typically takes about 6-7 seconds from that the OPI has loaded and the the PVs have established their connections. The first PV typically connects immediately, and while waiting for the remaining PVs to connect, I see several connection timeouts (not just one) being printed to the console.
  • If I re-load the OPI once, connections seem to be established within 6-7 seconds again, but there seems to be some variation, and it can also be quicker.
  • If I re-load the OPI again, I see a much longer time until connections have been established, perhaps on the order of magnitude of 30 seconds to 2 minutes.

Given that I have set EPICS_PVA_TCP_SOCKET_TMO to 2, I would expect approximately 2 seconds to be an upper bound on the establishment of the TCP connections in this setup, regardless of how often the OPI is re-loaded. I think a likely cause of this discrepancy is the fact that PVs may occur many times in the data-structures that represent the search state in the class ChannelSearch.

@kasemir
Copy link
Collaborator

kasemir commented Apr 2, 2025

I think there's still the threading issue.

When a new channel is requested, it's scheduled to be searched now, then in a second, then in two seconds, then in 4 seconds, ..., in 30 seconds.
As you open a display, the buckets will look somewhat like this:

bucket 0 (now): Search channels A, B, C, D, E
bucket 1 (1 sec after bucket 0): Search channels A, B, C, D, E
bucket 3 (2 sec after bucket 1): Search channels A, B, C, D, E
bucket 7 (4 sec after 3): Search channels A, B, C, D, E

So one search message will be issued for channels A, B, C, D, E.
Assume we get the reply for all those messages "real soon", but obviously in a random order.
If you're lucky, you get the reply for A first, it connects, then the one for B, it connects., ...
If you're unlucky, you get the reply for D first, it times out after 2 seconds, ...
The order of things will be unpredictable, and while we're timing out on D, the other replies queue up.

With high log level, you should see the bucket intervals, the searches sent out for each bucket, the replies etc:

2025-04-02 10:59:13.050339000 FINER [org.epics.pva] Search intervals: 2.02 s, 4.04 s, 8.09 s, ..., 129.41 s
2025-04-02 10:59:13.058622000 FINE [org.epics.pva] Register search for ‘xxx’ [CID 2, SID -1 INIT] now
2025-04-02 10:59:13.061954000 FINEST [org.epics.pva] Search bucket 0
2025-04-02 10:59:13.067238000 FINE [org.epics.pva] UDP Search Request #1 for [‘xxx’ [CID 2, SID -1 SEARCHING]]
2025-04-02 10:59:13.071837000 FINER [org.epics.pva] Sending search to UDP …
2025-04-02 10:59:13.075563000 FINER [org.epics.pva] Received UDP …
2025-04-02 10:59:13.078689000 FINE [org.epics.pva] Unregister search for xxx 2
2025-04-02 10:59:13.081441000 FINE [org.epics.pva] Reply for ‘xxx’ [CID 2, SID -1 FOUND] from TCP …

@shroffk
Copy link
Member Author

shroffk commented Apr 2, 2025

@abrahamwolk I have only addressed the comments... I agree that we still have a threading issue.

@kasemir
Copy link
Collaborator

kasemir commented Apr 2, 2025

OK, merge this?

@shroffk shroffk merged commit f67fdd6 into master Apr 2, 2025
2 checks passed
@abrahamwolk
Copy link
Collaborator

@abrahamwolk I have only addressed the comments... I agree that we still have a threading issue.

Absolutely. This pull-request is certainly an improvement and a feature that is good to have.

I didn't mean to imply that we shouldn't merge this, but rather provide a data-point on how the performance of the establishment of TCP connections is.

Thanks for implementing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants