Skip to content

Conversation

@SvenKordon
Copy link

I observed a crash of the nmos-cpp library when the nmos::server is closed and the http_listeners vector contains already closed listeners.

http_listener::close() returns a default created pplx::task from cpprest (2.10.17). This leads to a crash in pplx::when_all() under Linux CentOS8.1. I did not investigate the reason for this crash further, however, default created pplx::task behave different. For example they throw on wait or get calls.

To solve this issue I propose the following fix which only adds non default created tasks to the vector.

The provided fix is tested in my application where the problem occurs when a nmos::server is closed that comprises invalid http_listeners due to wrong socket configurations, like for example an already existing port number.

@garethsb
Copy link
Contributor

garethsb commented May 3, 2022

Hm, that doesn't sound good.

Can you explain how http_listener::close returns a default-constructed task? I expect it to return e.g. pplx::task_from_result() (a completed task) sometimes but not a default-constructed one.

But I do agree when_all does not seem to play nicely with default-constructed tasks. I see several accesses in the implementation of when_all which do not seem to account for default constructed tasks:

https://github.com/microsoft/cpprestsdk/blob/07cf589108f50ee40e56c56dc43bfab1022604e0/Release/include/pplx/pplxtasks.h#L6591

What we probably expected to happen is this... pplx::when_all to return a task that would throw the same pplx::invalid_operation exception as the default constructed task.

This would be handled by nmos::server::close by completing other operations and then returning that task. In this way, nmos::server::close would pass on the exception from the underlying http_listeners. This seems more appropriate than effectively hiding the problem? (In the example nmos-cpp-node main.cpp the contained exception would be thrown by the node_server_guard destructor, and handled by the catch-all block. It'd be nice if there was a specific catch (pplx::invalid_operation& e) block.)

We could submit a test case and patch to cpprestsdk but given it's in maintenance mode, I agree it might be simpler to handle the issue in this repo, though there are several instances of when_all and when_any to which any fix ought to be implied.

But first, I'd prefer to understand the first point, how http_listener::close returns a default-constructed task.

@SvenKordon
Copy link
Author

I found the implementation of close() here.
The function open returns m_close_task which is defined as pplx::task_from_result() in the default constructor http_listener_impl::http_listener_impl(). But the other two constructors,
etails::http_listener_impl::http_listener_impl(http::uri address) and details::http_listener_impl::http_listener_impl(http::uri address, http_listener_config config) do not define m_close_task as pplx::task_from_result(). Thus, it is default when using one of the other constructors, which is done in the nmos-cpp code.

So basically you are right, this bug should better be fixed in cpprest. But I'm not sure, how easy it is to provide a patch for cpprest.

@garethsb
Copy link
Contributor

garethsb commented May 5, 2022

I quickly added microsoft/cpprestsdk#1700 and microsoft/cpprestsdk#1701. Patches for both issues should be possible. However, to avoid having to wait for that to happen, maybe we could replace instances of pplx::when_all(tasks.begin(), tasks.end()) in nmos-cpp with a helper that does something like:

pplx::ranges::when_all(tasks | boost::adaptors::transformed([](pplx::task<X> task)
{
    try { task.is_done(); return task; }
    catch (const pplx::invalid_operation& e) { return pplx::task_from_exception<X>(e); }
})

(where pplx::ranges::when_all would be just a wrapper function, taking a range argument, for the pplx::when_all function that takes an iterator pair)

@SvenKordon
Copy link
Author

Thanks for adding the issues at cpprest.

I'm not an cpprest expert. Could you please explain if you like to write a new function pplx::range::when_all() that is used instead of pplx::when_all provided by the cpprest library or does the pplx::range::when_all() already exist in cpprest?
I am also a bit confused by the pplx::when_any in your last sentence. Did you mean pply::when_all() or did I miss something about pplx::when_any()?

Nevertheless, I think it is a very good idea to fix the when_all() call instead of sorting out the default tasks as proposed by me.

@garethsb
Copy link
Contributor

garethsb commented May 5, 2022

Yes, sorry, typo fixed!

Those pplx::ranges functions don't exist now. They could be submitted to cpprestsdk, but same is true of e.g. pplx::complete_after found in nmos-cpp pplx/pplx_utils.{h,cpp}, and I propose these new functions could be implemented there too.

@SvenKordon
Copy link
Author

Okay thanks, now it is more clear to me.
Basically, we need to write a new when_all() function like this:

template<class RandomAccessRange>
pplx::task<void> when_all( const RandomAccessRange& rng )
{
    return pplx::when_all( rng.begin(), rng.end() ).then( [rng]( pplx::task<void> finally )
    {
        for( auto task : rng ) details::wait_nothrow( task );
        finally.wait();
    } );
}

This function is then called by your code and should then only handle valid tasks, right?
Just one question to your code. Are you sure that a default task throws if task.is_done() has been called?

@garethsb
Copy link
Contributor

garethsb commented May 5, 2022

Just one question to your code. Are you sure that a default task throws if task.is_done() has been called?

https://github.com/microsoft/cpprestsdk/blob/07cf589108f50ee40e56c56dc43bfab1022604e0/Release/include/pplx/pplxtasks.h#L3642-L3647

If we don't want to rely on that, we could do as I originally wrote:

    try { task.wait(); return task; }
    catch (const pplx::invalid_operation& e) { return pplx::task_from_exception<X>(e); }
    catch (...) { return task; }

The possible issue that made me rewrite using is_done is that the tasks which throw other exceptions would have been "observed" by wait, which could mask a programming error in the caller.

Basically, we need to write a new when_all() function like this:

[...]

This function is then called by your code and should then only handle valid tasks, right?

Yes, so I think the following may be sufficiently composable, leaving the job of wait_nothrow for all tasks, etc. to the caller.

template <typename InputRange>
pplx::task<void> when_all(InputRange&& tasks, const task_options& options = task_options())
{
    return pplx::when_all(tasks.begin(), tasks.end(), options);
}

@SvenKordon
Copy link
Author

Okay I agree that is_done should work as well and is the better choice as the wait shall be done in the original when_all.
However, I'd think that we still need the wait_nothrow if we keep the is_done. Because if I've got it correctly, the original when_all() stops on exceptions so that it won't wait for all tasks after the exception, which is desired in our case right?

https://github.com/microsoft/cpprestsdk/blob/07cf589108f50ee40e56c56dc43bfab1022604e0/Release/include/pplx/pplxtasks.h#L6863-L6867

@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

Yes, wait_nothrow is still required, I just meant that it should remain in the calling code's continuation not the implementation of pplx::ranges::when_all.

@SvenKordon
Copy link
Author

Okay, that's fine. I'm currently testing the code changes with a small unit test that adds some http_listeners into the nmos::server and directly closes them. However, currently the original when_all still crashes with an unhandled exception.

I'm not absolutely sure but I think that the created task from pplx::task_from_exception<X>(e) might throw again in when_all(). Is this really what we want or shall we better generate a task from pplx::task_from_result()?

@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

Sorry, I'm away from compiler right now. In my tests, pplx::task_from_exception<X>(e) was handled as I'd expect by pplx::when_all, with no crash. Wonder why our test cases are different?🤔
In calling code, we definitely still need the for loop over all the tasks in the continuation to ensure all exception-containing tasks are observed.

In total I was expecting something like this to work...

return pplx::ranges::when_all(tasks | boost::adaptors::transformed([](pplx::task<void> task)
{
    try { task.is_done(); return task; }
    catch (const pplx::invalid_operation& e) { return pplx::task_from_exception<void>(e); }
}).then([tasks](pplx::task<void> finally)
{
   for (auto& task : tasks) details::wait_nothrow(task);
   finally.wait();
});

Maybe you're right and (knowing that the fix really belongs in cpprestsdk in the impl of pplx::when_all) the filtered range should be factored into pplx::ranges::when_all itself... and maybe we need a version of pplx::observe_exception that takes a range of tasks and does the no-throw wait, so that we could simplify the calling code in nmos::server to...

return pplx::ranges::when_all(tasks)
    .then(pplx::observe_exception<void>(tasks));

@SvenKordon
Copy link
Author

I've implemented exactly the same code but it crashes with an unhandled exception. I did my tests under Windows using Visual Studio 2017 and ccprest 2.10.18.

It does not crash when I use pplx::task_from_result() instead of pplx::task_from_exception<void>(e). But I agree that it should also work the other way around as the documentation of when_all() states that it could handle exceptions within the tasks.

The crash happens in the destructor of the task in~_ExceptionHolder.

 ~_ExceptionHolder()
  {
      if (_M_exceptionObserved == 0)
      {
          // If you are trapped here, it means an exception thrown in task chain didn't get handled.
          // Please add task-based continuation to handle all exceptions coming from tasks.
          // this->_M_stackTrace keeps the creation callstack of the task generates this exception.
          _REPORT_PPLTASK_UNOBSERVED_EXCEPTION();
      }
  }

Here is the call stack stored in this->_M_stackTrace:

0x00007ff786172896 {nmos-cpp-test.exe!Concurrency::task_from_exception<void,Concurrency::invalid_operation>(Concurrency::invalid_operation _Exception, const Concurrency::task_options & _TaskOptions), Line 7081}
0x00007ff786ae7340 {nmos-cpp-test.exe!`Concurrency::task<void> <lambda>(Concurrency::task<void>)'::`1'::catch$4(void), Line 103}
vcruntime140d.dll!0x00007ffbf21a1030 (load symbols for additional information)
vcruntime140d.dll!0x00007ffbf21a74da (load symbols for additional information)
ntdll.dll!0x00007ffc219d1426 (load symbols for additional information)
0x00007ff7860d2b42 {nmos-cpp-test.exe!nmos::server::close_listeners::__l2::Concurrency::task<void> <lambda>(void)::__l2::<lambda>(Concurrency::task<void> task), Line 102}
0x00007ff786035204 {nmos-cpp-test.exe!boost::range_detail::default_constructible_unary_fn_wrapper<Concurrency::task<void> <lambda>(Concurrency::task<void>),Concurrency::task<void> >::operator()<Concurrency::task<void> >(Concurrency::task<void> & arg), Line 62}
0x00007ff786072713 {nmos-cpp-test.exe!boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<Concurrency::task<void> <lambda>(Concurrency::task<void>),Concurrency::task<void> >,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<Concurrency::task<void> > > >,boost::use_default,boost::use_default>::dereference(void), Line 126}
0x00007ff78604d5ba {nmos-cpp-test.exe!boost::iterators::iterator_core_access::dereference<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<Concurrency::task<void> <lambda>(Concurrency::task<void>),Concurrency::task<void> >,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<Concurrency::task<void> > > >,boost::use_default,boost::use_default> >(const boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<Concurrency::task<void> <lambda>(Concurrency::task<void>),Concurrency::task<void> >,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<Concurrency::task<void> > > >,boost::use_default,boost::use_default> & f), Line 550}
0x00007ff78649df62 {nmos-cpp-test.exe!boost::iterators::detail::iterator_facade_base<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<Concurrency::task<void> <lambda>(Concurrency::task<void>),Concurrency::task<void> >,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<Concurrency::task<void> > > >,boost::use_default,boost::use_default>,Concurrency::task<void>,boost::iterators::random_access_traversal_tag,Concurrency::task<void>,__int64,0,0>::operator*(void), Line 656}

Thus, the problem seems to be that when_all() is not able to handle task_from_exception tasks.
I'd propose to implement it that way:

return pplx::range::when_all( tasks | boost::adaptors::transformed( []( pplx::task<void> task )
{
    try { task.is_done(); return task; }
    catch( const pplx::invalid_operation& e ) { return pplx::task_from_result(); }
} ) ).then( [tasks]( pplx::task<void> finally )
{
    for( auto task : tasks ) details::wait_nothrow( task );
    finally.wait();
} );

Because I would assume that the task returned from http_listener::close() should be of type pplx::task_from_result() when the listener is already closed, which would already be the case for default constructed http_listener instances.

Moved fix into a pplx::range::when_all() function getting a filtered list of tasks.
Added an unit test for pplx::range::when_all().
@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

Ah, sorry, I realise the issue... Although pplx::when_all handles exception-containing tasks OK, you still have to observe (call wait or get) on those tasks in calling code or you will get the SIGTRAP from ~_ExceptionHolder that you have found. But if in pplx::ranges::when_all we convert the default-constructed tasks into other tasks that the calling code never sees, those won't be observed...

Two possible solutions - map to pplx::task_from_result as you proposed, or map to pplx::task_from_exception but observe it internally in pplx::ranges::when_all. I think I prefer the latter as it behaves similarly to the default constructed task (which doesn't need to be observed, but does throw from get and wait).

@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

@SvenKordon I've had a quick go at this... so far I've made two four (after tweaks for various compilers) commits on the branch https://github.com/garethsb/nmos-cpp/tree/default-task-workaround which are now being tested in CI.

@SvenKordon
Copy link
Author

@garethsb I looked into your code which is fine for me so far. I think, it is a good idea to have our own implementation for when_all and when_any.

However, there is one thing I'm not really sure about. Maybe the patched code is not asynchronous anymore as the wait is already called in the transform adaptor.

pplx::ranges::details::when_all(tasks | boost::adaptors::transformed(pplx::details::workaround_default_task<ReturnType>()));

I'd expect that the when_all does not wait at all anymore. For closing the listeners this might be acceptable but in other cases we may want run other code in parallel while when_all is waiting for the tasks to be finished.

@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

Do you mean the call to task.wait() in pplx::details::workaround_default_task?

That is protected by the previous line:

if (!pplx::empty(task)) return task;

The empty function is just a nicer check I remembered for a default-constructed task, rather than using is_done. Therefore we only call wait on default-constructed tasks, which of course returns immediately.

Or have I missed something?

@SvenKordon
Copy link
Author

Okay sure! Sorry, now I've got it.

We check for an empty task and call wait only on empty ones. This will throw, which creates a new task of type pplx::task_from_exception<ReturnType>(e). Then we wait for the new task and handle its exception because this cannot be done by when_all.
Finally the struct observe_exceptions is replacing the code

.then( [tasks]( pplx::task<void> finally )
{
    for( auto task : tasks ) details::wait_nothrow( task );
    finally.wait();
} );

Thanks a lot! I learned a lot about cpprest and boost by this. I think we can close this pull request. Okay?

@garethsb
Copy link
Contributor

garethsb commented May 6, 2022

Thanks very much for working this through with me. I've made PR #260 from the branch, so if that works for you, hopefully @lo-simon will review and merge, and this one can be closed.

@SvenKordon SvenKordon closed this May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants