Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.

Conversation

@rhaschke
Copy link
Contributor

This is an attempt to fix #870.

According to C++ standard static variables and registered atexit handlers are destroyed/called in reverse order of their occurence.
The ROS source base seems to assume, that atexit handlers are called before destruction of any static variables.

For example, the singletons TopicManager ... XMLRPCManager are created in ros::start() after having ros::init() registering the atexit handler. Hence, they will be destroyed before the call to the atexit handler, which attempted to access them and call their shutdown methods after destruction.

Use of global variables should be avoided in general, because their destruction order is determined by library linking. Using singletons the order of destruction can be determined by calling order.

This bug fix should be applied to Indigo and Jade as well and the source base should be checked for similar issues. I tracked the issue down with AddressSanitizer using-DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer -O1".

According to C++ standard (http://en.cppreference.com/w/cpp/utility/program/exit)
static variables and registered atexit handlers are destroyed/called in
reverse order of their occurence.

As the singletons TopicManager ... XMLRPCManager are created in
ros::start() after having ros::init() registering the atexit handler,
they will be destroyed before the call to the atexit handler, which
attempted to access them and call their shutdown methods again.

Use of global variables should be avoided in general, because their
destruction order is determined by library linking. Using singletons
the order of destruction can be determined by calling order.
@rhaschke
Copy link
Contributor Author

CI failure was due to Slave went offline during the build.

@tfoote
Copy link
Member

tfoote commented Aug 19, 2016

@ros-pull-request-builder retest this please

@tfoote
Copy link
Member

tfoote commented Aug 19, 2016

@ros-pull-request-builder retest this please

@rhaschke
Copy link
Contributor Author

@tfoote buildfarm again failed for an unrelated reason. What's going on there?

@tfoote
Copy link
Member

tfoote commented Aug 19, 2016

Sorry about the noise I was trying to debug why the retriggering wasn't working: ros-infrastructure/ros_buildfarm#332 so I forced the machine offline. It retriggered, but had the same problem. I'll let it run this time.

@ros-pull-request-builder retest this please

@rhaschke
Copy link
Contributor Author

@ros-pull-request-builder retest please

@rhaschke
Copy link
Contributor Author

Check succeeded on CI, but is marked unstable there. Shouldn't it marked succeeded here?

@tfoote
Copy link
Member

tfoote commented Aug 20, 2016

It built, but it did not pass all the tests. http://build.ros.org/job/Kpr__ros_comm__ubuntu_xenial_amd64/147/testReport/

@rhaschke
Copy link
Contributor Author

Hm. I cannot reproduce the buildfarm error locally on my machine. I think somebody of the maintainers should have a look.

/vol/sandbox/ros/src.ros_comm/ros_comm/tools/rostest/scripts/rostest --pkgdir=/vol/sandbox/ros/src.ros_comm/ros_comm/test/test_roscpp --package=test_roscpp --results-filename test_launch_subscribe_star.xml --results-base-dir /vol/sandbox/ros/build.ros_comm/test_roscpp/test_results /vol/sandbox/ros/src.ros_comm/ros_comm/test/test_roscpp/test/launch/subscribe_star.xml

... logging to /homes/rhaschke/.ros/log/rostest-idas-23822.log
[ROSUNIT] Outputting test results to /vol/sandbox/ros/build.ros_comm/test_roscpp/test_results/test_roscpp/rostest-test_launch_subscribe_star.xml
[ERROR] ros.roscpp: Client [/subscribe_star] wants topic /test_star_inter to have datatype/md5sum [*/d41d8cd98f00b204e9800998ecf8427e], but our version has [test_roscpp/TestArray/f7c2f87680985b118316f81f28b4cfd5]. Dropping connection.
testsubscribe_star ... ok

[ROSTEST]-----------------------------------------------------------------------

[test_roscpp.rosunit-subscribe_star/simpleSubFirstIntra][passed]
[test_roscpp.rosunit-subscribe_star/simplePubFirstIntra][passed]
[test_roscpp.rosunit-subscribe_star/multipleSubsStarFirstIntra][passed]
[test_roscpp.rosunit-subscribe_star/multipleSubsConcreteFirstIntra][passed]
[test_roscpp.rosunit-subscribe_star/multipleShutdownConcreteIntra][passed]
[test_roscpp.rosunit-subscribe_star/simpleInter][passed]
[test_roscpp.rosunit-subscribe_star/simpleInterUDP][passed]
[test_roscpp.rosunit-subscribe_star/switchTypeInter][passed]

SUMMARY
 * RESULT: SUCCESS
 * TESTS: 8
 * ERRORS: 0
 * FAILURES: 0

Interestingly the error message thrown, when switching the published type, is different that on buildfarm, which says:

[FATAL] [1471646470.908969076]: ASSERTION FAILED
15:41:10    file = /tmp/catkin_workspace/install_isolated/include/ros/publisher.h
15:41:10    line = 115
15:41:10    cond = impl_->md5sum_ == "*" || std::string(mt::md5sum<M>(message)) == "*" || impl_->md5sum_ == mt::md5sum<M>(message)
15:41:10    message =
15:41:10 [FATAL] [1471646470.909092564]: Trying to publish message of type [test_roscpp/TestEmpty/d41d8cd98f00b204e9800998ecf8427e] on a publisher with type [test_roscpp/TestArray/f7c2f87680985b118316f81f28b4cfd5]

rhaschke added a commit to moveit/moveit that referenced this pull request Sep 2, 2016
rhaschke added a commit to moveit/moveit that referenced this pull request Sep 2, 2016
v4hn pushed a commit to moveit/moveit that referenced this pull request Sep 2, 2016
…le (#184)

* moveit_commander: allow setting of targets from any iterable

* disable cleanup test

fails due to ros/ros_comm#871
v4hn pushed a commit to moveit/moveit that referenced this pull request Sep 2, 2016
* moveit_commander: allow setting of targets from any iterable

* disable cleanup test

fails due to ros/ros_comm#871
v4hn pushed a commit to moveit/moveit that referenced this pull request Sep 2, 2016
* moveit_commander: allow setting of targets from any iterable

* disable cleanup test

fails due to ros/ros_comm#871
@dirk-thomas
Copy link
Member

dirk-thomas commented Sep 6, 2016

I can't reproduce the failing test locally either (without modification, see below). But the assertion you posted, the one I see locally, and the one in the build farm log seem to be all identical (and expected).

For me as well as you locally the getNumPublishers() is just zero (as it is expected) where as the build farm somehow get 1. I can make the test fail in the same way as on Jenkins though by commenting out the second sleep call in the switchTypeInter test. So it seems likely that the changed shutdown behavior is somehow affecting the test. Update: probably not a fair example since the tests fails with that change even without this patch...

I will trigger the job again just to see if it is actually flaky: @ros-pull-request-builder retest this please

@wjwwood
Copy link
Member

wjwwood commented Sep 8, 2016

+1

@dirk-thomas
Copy link
Member

Thank you for the great patch.

@dirk-thomas dirk-thomas merged commit 96e6b15 into ros:kinetic-devel Sep 8, 2016
@rhaschke rhaschke deleted the fix-static-destruction-order branch September 9, 2016 01:06
@rhaschke
Copy link
Contributor Author

rhaschke commented Sep 9, 2016

Thanks for merging this.
@dirk-thomas Do you plan to backport this bug fix to Indigo and Jade too? That would be great.

@dirk-thomas
Copy link
Member

This can be considered for backport. But I will likely wait some time (until it is released and synced into Kinetic plus acouple of weeks) before doing it. Simply to have enough time to catch potential regressions.

@rhaschke
Copy link
Contributor Author

I agree.

v4hn pushed a commit to moveit/moveit that referenced this pull request Oct 19, 2016
* fix MoveGroup -> MoveGroupInterface in python tests

* re-enable test

ros/ros_comm#871 is merged and released in Kinetic
130s pushed a commit to 130s/moveit-2 that referenced this pull request Oct 21, 2016
* fix MoveGroup -> MoveGroupInterface in python tests

* re-enable test

ros/ros_comm#871 is merged and released in Kinetic
130s pushed a commit to 130s/moveit-2 that referenced this pull request Oct 21, 2016
* fix MoveGroup -> MoveGroupInterface in python tests

* re-enable test

ros/ros_comm#871 is merged and released in Kinetic
@arpit15
Copy link

arpit15 commented Feb 1, 2017

@rhaschke @dirk-thomas Can the plan to backport this bug fix to Indigo and Jade be done? That would be great. I am currently using 14.04 and moveit indigo version

@dirk-thomas
Copy link
Member

There has been no Indigo / Jade release since then. As soon as a new patch release is being made all patches from Kinetic will be considered. This will likely take a few more weeks to happen since we will first focus on another patch release for Kinetic.

@dgossow
Copy link
Contributor

dgossow commented Oct 10, 2023

@rhaschke would you care to take a look at my follow up PR? #2355

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants