Skip to content

ADCore related SIGSEGV error when exiting ioc #5

@kgofron

Description

@kgofron

SIGSEGV error when exiting ioc

epics> 
epics> exit
PrvHst TCP disconnected
tpx3App: abort due to signal 11 (SIGSEGV) "Segmentation Fault"...
./st.cmd: line 5: 1396260 Aborted                 ../../bin/linux-x86_64/tpx3App st_base.cmd

Using

  • ADCore: 3.12.1
  • asyn R4-44-2
  • ADTimepix3: 1.5.0 {updated}
  • SDK / Serval: 4.1.3

Summary of what was added:

  1. “Destroyed pool” registry (NDArrayPool)
  • NDArray.h: Declared two static methods: registerDestroyingPool(NDArrayPool *p) and isPoolDestroyed(NDArrayPool *p).
  • NDArrayPool.cpp: Implemented them with a static set of pool pointers and a mutex. Pools are only ever added to the set; the set is never cleared (process lifetime).
  • asynNDArrayDriver.cpp: At the start of the destructor (before nulling pArrays and deleting the pool), the driver calls NDArrayPool::registerDestroyingPool(this->pNDArrayPoolPvt_).
  1. Safe release in NDArray
  • NDArray.cpp: At the start of NDArray::release(), if NDArrayPool::isPoolDestroyed(pNDArrayPool) is true, the code sets pNDArrayPool = NULL and returns ND_ERROR without calling the pool.
  • isPoolDestroyed() uses the pool pointer only as an address for set lookup; it does not dereference it, so it is safe even if the pool object has already been destroyed.

So when PVA (or anything else) calls array->release() after the driver and pool have been destroyed, the array’s pNDArrayPool may be a dangling pointer, but we only use its value as a key. We find it in the “destroyed” set and no-op instead of calling into the deleted pool.

What fixed it (for your notes):

  1. ADCore destructor – In ~asynNDArrayDriver(), the driver now stores maxAddr_, and before deleting the pool it:
  • Calls NDArrayPool::registerDestroyingPool(pNDArrayPoolPvt_) so late release() calls can no-op
  • Sets pArrays[i]->pNDArrayPool = NULL for all driver-owned arrays
  1. “Destroyed pool” registry – In NDArrayPool we added a static registry of pool addresses. NDArray::release() checks that registry (using only the pointer value, no dereference) and, if the pool was registered as destroying, sets pNDArrayPool = NULL and returns without calling the pool. That way pvAccess (and ntndArrayConverter’s freeNDArray deleter) can call release() after the pool is deleted without crashing.

The build warnings you see are from other ADCore code (NDPluginCircularBuff, NDFileMagick, NDPluginBadPixel) and are unrelated to these changes.

My ADCore is a local fork, and I consider sending the SIGSEGV-on-exit fix (destructor + registerDestroyingPool / isPoolDestroyed + NDArray::release() check) to the areaDetector/ADCore maintainers so it can be included in a future release.

ADCore 3.12.1 modifications

  • modified: ADApp/ADSrc/NDArray.cpp
  • modified: ADApp/ADSrc/NDArray.h
  • modified: ADApp/ADSrc/NDArrayPool.cpp
  • modified: ADApp/ADSrc/asynNDArrayDriver.cpp
  • modified: ADApp/ADSrc/asynNDArrayDriver.h
diff --git a/ADApp/ADSrc/NDArray.h b/ADApp/ADSrc/NDArray.h
index f7ad79b6..ade9f48d 100644
--- a/ADApp/ADSrc/NDArray.h
+++ b/ADApp/ADSrc/NDArray.h
@@ -183,6 +183,12 @@ public:
     int          getNumFree();
     void         emptyFreeList();
 
+    /** Register a pool as being destroyed so late NDArray::release() calls no-op (avoids SIGSEGV
+     * when pvAccess tears down MonitorElements after the driver and pool are deleted). */
+    static void registerDestroyingPool(NDArrayPool *p);
+    /** Check if the given pool address was registered as destroyed (pointer not dereferenced). */
+    static bool isPoolDestroyed(NDArrayPool *p);
+
 protected:
index 2447a9bc..9909abda 100644
--- a/ADApp/ADSrc/NDArray.cpp
+++ b/ADApp/ADSrc/NDArray.cpp
@@ -230,6 +230,11 @@ int NDArray::release()
     printf("%s: WARNING, no owner\n", functionName);
     return(ND_ERROR);
   }
+  /* No-op if pool was already destroyed (e.g. IOC exit: PVA tears down after driver). */
+  if (NDArrayPool::isPoolDestroyed(pNDArrayPool)) {
+    pNDArrayPool = NULL;
+    return(ND_ERROR);
+  }
   return(pNDArrayPool->release(this));
 }
diff --git a/ADApp/ADSrc/asynNDArrayDriver.h b/ADApp/ADSrc/asynNDArrayDriver.h
index a55cdd7c..6aa118d8 100644
--- a/ADApp/ADSrc/asynNDArrayDriver.h
+++ b/ADApp/ADSrc/asynNDArrayDriver.h
@@ -233,6 +233,7 @@ protected:
     int threadPriority_;
 
 private:
+    int maxAddr_;
     NDArrayPool *pNDArrayPoolPvt_;
     epicsMutex *queuedArrayCountMutex_;
     epicsEventId queuedArrayEvent_;
diff --git a/ADApp/ADSrc/asynNDArrayDriver.cpp b/ADApp/ADSrc/asynNDArrayDriver.cpp
index 7f61b6e9..500b21f0 100644
--- a/ADApp/ADSrc/asynNDArrayDriver.cpp
+++ b/ADApp/ADSrc/asynNDArrayDriver.cpp
@@ -850,7 +850,7 @@ asynNDArrayDriver::asynNDArrayDriver(const char *portName, int maxAddr, int maxB
                      interfaceMask | asynInt32Mask | asynFloat64Mask | asynOctetMask | asynInt32ArrayMask | asynGenericPointerMask | asynDrvUserMask,
                      interruptMask | asynInt32Mask | asynFloat64Mask | asynOctetMask | asynInt32ArrayMask | asynGenericPointerMask,
                      asynFlags, autoConnect, priority, stackSize),
-      pNDArrayPool(NULL), queuedArrayCountMutex_(NULL), queuedArrayCount_(0),
+      pNDArrayPool(NULL), maxAddr_(maxAddr), queuedArrayCountMutex_(NULL), queuedArrayCount_(0),
       queuedArrayUpdateRun_(true)
 {
     char versionString[20];
@@ -1008,6 +1008,18 @@ asynNDArrayDriver::~asynNDArrayDriver()
     epicsEventSignal(queuedArrayEvent_);
     epicsEventWait(queuedArrayUpdateDone_);
 
+    /* Register pool as destroying so any late NDArray::release() (e.g. from pvAccess
+     * MonitorElement teardown) no-ops instead of touching the pool we are about to delete. */
+    NDArrayPool::registerDestroyingPool(this->pNDArrayPoolPvt_);
+
+    /* Null pNDArrayPool on driver-owned arrays as well (belt and braces). */
+    if (this->pArrays) {
+        for (int i = 0; i < maxAddr_; i++) {
+            if (this->pArrays[i])
+                this->pArrays[i]->pNDArrayPool = NULL;
+        }
+    }
+
     delete this->pNDArrayPoolPvt_;
     free(this->pArrays);
     delete this->pAttributeList;
diff --git a/ADApp/ADSrc/NDArrayPool.cpp b/ADApp/ADSrc/NDArrayPool.cpp
index b28122e5..450850b1 100644
--- a/ADApp/ADSrc/NDArrayPool.cpp
+++ b/ADApp/ADSrc/NDArrayPool.cpp
@@ -8,6 +8,7 @@
 
 #include <string.h>
 #include <stdlib.h>
+#include <set>
 #include <dbDefs.h>
 #include <stdint.h>
 
@@ -42,6 +43,33 @@ static const char *driverName = "NDArrayPool";
 volatile int eraseNDAttributes=0;
 extern "C" {epicsExportAddress(int, eraseNDAttributes);}
 
+/* Registry of pool addresses being destroyed so NDArray::release() can no-op (avoids SIGSEGV
+ * when pvAccess tears down after the driver). Only the address is used; the pointer is never
+ * dereferenced in isPoolDestroyed(). */
+static std::set<NDArrayPool *> *destroyedPools = NULL;
+static epicsMutexId destroyedPoolsMutex = NULL;
+
+void NDArrayPool::registerDestroyingPool(NDArrayPool *p)
+{
+    if (!p) return;
+    if (!destroyedPoolsMutex)
+        destroyedPoolsMutex = epicsMutexCreate();
+    if (!destroyedPools)
+        destroyedPools = new std::set<NDArrayPool *>();
+    epicsMutexLock(destroyedPoolsMutex);
+    destroyedPools->insert(p);
+    epicsMutexUnlock(destroyedPoolsMutex);
+}
+
+bool NDArrayPool::isPoolDestroyed(NDArrayPool *p)
+{
+    if (!p || !destroyedPoolsMutex || !destroyedPools) return false;
+    epicsMutexLock(destroyedPoolsMutex);
+    bool found = (destroyedPools->find(p) != destroyedPools->end());
+    epicsMutexUnlock(destroyedPoolsMutex);
+    return found;
+}
+
 /** NDArrayPool constructor
   * \param[in] pDriver Pointer to the asynNDArrayDriver that created this object.
   * \param[in] maxMemory Maxiumum number of bytes of memory the the pool is allowed to use, summed over

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions