Race between closing the file and copying it#128
Conversation
|
Thank you for reporting this issue @pchlupacek Unfortunately the proper fix for this problem won't be so easy. Can you please revert the changes done to deactivate() and empty the checkIn method instead and add a TODO? Then open an issue to review the lifecycle of this component. This will have to be carefully analysed. |
|
@hrosa, sure. One idea I had in mind to proper fix it is following:
This can be also forked to another thread (or task with scheduler), as the file names anyhow are // shall be unique. |
|
@pchlupacek that sounds good, feel free to open issue and take ownership if you're up to it :) In fact, we have some issues on RestComm with long recorded calls because MS always takes quite some time to make the file available. So any strategy that helps speed up this process is very much welcome. Overall, I'm currently working on improving and simplifying the class model of the Media Server. I recently added a new ResourcePool implementation that calls out checkIn/checkout methods to reset and initialise the state of these resources. But as we just witnessed (thanks to you) we cannot implement this method carefree because of these race conditions. That's why we need to review the lifecycle of each pooled resources. This will probably be taken care of in milestones 4.3.0 or 4.4.0. |
|
@hrosa we found quite a lot of oddities with MS concurrency model. I am not sure what are your plans / restrictions here are, but perhaps just a quick conversation may help us to understand it better I am coming from scala (functional) so perhaps better understand what your plans are, could make some development more effective. What do you think ? |
|
@pchlupacek Yes that is true, there are some oddities at the moment. So originally the Media Server used a Scheduler to execute all tasks in intervals of 20ms. This interval was imposed to ensure media synchronization. As you can imagine, this is pretty bad because any non-media tasks are delayed 20ms without real necessity and this slows things down in MS core. When I implemented support for WebRTC this was specially bad because in a single channel we had several protocols multiplexed (RTP, RTCP, STUN, DTLS). Although the 20ms applies finely for RTP, the same is not true for other protocols. What happened in the end was that we experienced increasing delays in calls because the MS could not read from sockets in a timely fashion to support all the multiplexed protocols. Now, the problem is that we have two different schedulers co-existing in the Media Server and this may cause some oddities. And this is one issue that we will try to address in the 4.x milestones, together with improving the class model of MS. Regarding class model, I already changed the way the Media Server bootstraps and introduced Guice to provide Dependency Injection and AOP capabilities. DI will help simplify the complicated class model and AOP will hide access to aspects like resource pooling. We are also going to integrate Netty in the project to replace that dreadful UDPManager class, which will also help improve the thread model as far as IO is concerned. When MGCP refactoring and Netty tasks are completed, we need to start looking at the Media Scheduler and all media components (player, recorder, etc). I think by the end of this task we can get the threading model right in MS. This is a lot of work and may take some time to achieve, so I will ask some patience from users. If you're experiencing issues please open tickets here on github. If you want to debate the MS design flaws please open a thread in our public forum https://groups.google.com/forum/#!forum/restcomm Let me know if you have any questions. |
|
Forgot to mention, feel free to join the community here https://gitter.im/RestComm/Restcomm-discuss The team and community are always present here and we held weekly meetings every Wednesday at 4pm (GMT) |
|
@hrosa thanks will follow up on gitter |
|
Thanks @pchlupacek ! Acknowledged contribution at https://telestax.com/acknowledgements/ |
Output stream closes on checkIn(). So the file copy operation may be interrupted in middle.