-
Notifications
You must be signed in to change notification settings - Fork 963
Fix core dumps triggered by rocksdb compacting when shutdown bk #4706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
560ace4 to
bc91be9
Compare
| while (!entryLocationIndex.compareAndSetCompacting(false, true)) { | ||
| Thread.sleep(100); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be done in the close method in the EntryLocationIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense @zymap
ecaea19 to
7018ea1
Compare
1556a35 to
abee9aa
Compare
| @Override | ||
| public void close() throws IOException { | ||
| log.info("Closing EntryLocationIndex"); | ||
| while (!compacting.compareAndSet(false, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This waiting could result in the system being stuck here indefinitely, or it could take an exceptionally long time to get stuck at this step.
Should we add a maximum waiting time, or do something else, such as modifying RocksDB operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so. Closing a DB which in compaction status may triger core dumps or other unexpected error. If Closing the DB cost toot long time, I think we'd better find out why or kill -9 if we need.
The handling approach here is similar to the procedure below:
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
Lines 770 to 772 in 3a5cf9d
| while (!compacting.compareAndSet(false, true)) { | |
| // Wait till the thread stops compacting | |
| Thread.sleep(100); |
| public void compact() throws IOException { | ||
| try { | ||
| isCompacting = true; | ||
| if (!compacting.compareAndSet(false, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question:
Is the isCompacting syntax the root cause of this problem?
Or is it simply a matter of rewriting the method more efficiently, thus ruling out issues with the isCompacting setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's the root cause. We should cancel the compacting if we have closed the DB, or we should delay closing the DB if we have already been in compaction status. So here We need an atomic variable to serve as a flag for the DB status. @StevenLuMT
Descriptions of the changes in this PR:
Fix #4705
Motivation
Fix #4705
Changes
1.Setting the
compactingflag ofentryLocationIndexas true when shutdownLedgerStorageto stopping the subsequentcompact.2. When
LedgerStorageshutdown we will wait unit thecompactend.