Skip to content

KAFKA-6647 KafkaStreams.cleanUp creates .lock file in directory its trying to clean (Windows OS) #6569

Closed
jukkakarvanen wants to merge 9 commits intoapache:trunkfrom
jukkakarvanen:KAFKA-6647_deleteAfterUnlock
Closed

KAFKA-6647 KafkaStreams.cleanUp creates .lock file in directory its trying to clean (Windows OS) #6569
jukkakarvanen wants to merge 9 commits intoapache:trunkfrom
jukkakarvanen:KAFKA-6647_deleteAfterUnlock

Conversation

@jukkakarvanen
Copy link
Copy Markdown
Contributor

This KAFKA-6647 KafkaStreams.cleanUp creates .lock file in directory its trying to clean (Windows OS) has been open for a while, and long conversation in previous Pull Requests. Anyway this problems is causing the TopologyTestDriver driver based unit test failing in Windows if not adding extra exception handling in there.

This PR version is trying to keep the general functionality as similar as earlier. Only add one extra retry of delete, if first failed due to DirectoryNotExmptyException. When added the retry logic only at the end of finally, caused checkstyle CyclomaticComplexity and NPathComplexity to go above threshold.
After it extracted cleanRemovedTaskDir and deleteTaskDir methods to avoid complexity.
Also time condition evaluation changed to be first before locking, so no need to lock if inner block is doing nothing.

No external functionality changed, so no additional test cases added.

Following five test in StateDirectoryTest failed earlier in Windows.
StateDirectoryTest.shouldCleanUpTaskStateDirectoriesThatAreNotCurrentlyLocked
StateDirectoryTest.shouldCleanupStateDirectoriesWhenLastModifiedIsLessThanNowMinusCleanupDelay
StateDirectoryTest.shouldNotLockStateDirLockedByAnotherThread
StateDirectoryTest.shouldNotUnLockStateDirLockedByAnotherThread
StateDirectoryTest.shouldCleanupAllTaskDirectoriesIncludingGlobalOne

Test shouldNotLockStateDirLockedByAnotherThread is still failing due to
there is no way to unlock task of already dead thread without adding extra code to StateDirectory class.
The test left as is, but explanatory comment added. The rest four of those five tests are now successful also in Windows.

Also shouldUseSerdesDefinedInMaterializedToConsumeGlobalTable test in StreamsBuilderTest failed in Windows before this change. After the change all the tests in StreamsBuilderTest are succesful.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jukkakarvanen
Copy link
Copy Markdown
Contributor Author

Simplified the refactoring and reversed logic back to original relating to locking and condition evaluation order.

@jukkakarvanen
Copy link
Copy Markdown
Contributor Author

I wonder what is the reason: the previous build failed and now only after update of comment, it is successful.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Apr 15, 2019

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Apr 15, 2019

Hi @jukkakarvanen and thanks for the contribution! We've discussed this PR and there is another PR (#5650) that aims to solve the same issue with a different approach that seems a little safer to implement. Would you consider reworking your current PR to fit the approach mapped out in #5650?

Ping us here an let us know when this is ready for review. Thanks
\cc @tedyu

@jukkakarvanen
Copy link
Copy Markdown
Contributor Author

jukkakarvanen commented Apr 17, 2019

Hi @bbejeck ,
Yes, I agree. My change was acting to the symptom, not fixing the actual problem.
The previous PR has been hanging so long time, so I expect there was some issues why it is not accepted. That's why I was proposing this kind of workaround not affecting to functionality of Non-Windows environments.

The most important thing is that this bug get fixed.

I can close this and reopen after merged idea of #5650

@jukkakarvanen
Copy link
Copy Markdown
Contributor Author

@bbejeck

We've discussed this PR and there is another PR (#5650) that aims to solve the same issue with a different approach that seems a little safer to implement.

Clarification, was the conclusion in #5650:

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Apr 17, 2019

Yes. Should be useOldLocking :)

@jukkakarvanen
Copy link
Copy Markdown
Contributor Author

I have been checking this KAFKA-6647 with two locking policy approach.
Not sure where is good place to continue discussion before new PR. So I added my comments to #5650.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants