Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager#5344
Conversation
…rCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file.
| * | ||
| * This method is not just thread-safe, but is also safe to use from multiple processes on the same machine. | ||
| */ | ||
| public static void writeAtomically(final File file, OutputStreamConsumer f) throws IOException |
There was a problem hiding this comment.
I wonder if there is no such utility already in Apache Commons IO, Guava, or JDK itself?
There was a problem hiding this comment.
I looked (briefly) but did not find one.
|
👍 for this PR. some extra thoughts: |
I think it is still important. I have a couple of customers where lookups are critical and in that case, coordinator downtime is not a good excuse for them not being available. (Especially since sometimes rolling upgrade documentation says that all coordinators must be taken down at once) In fact, to help them out more we may also be looking to add more aggressive on disk caching features, so a server can restore all of its lookup data on boot (not just configuration) without access to any external services. |
|
@gianm thanks for clarifying the usefulness of snapshot of configuration, let us keep it then.
We do that with some of our lookups which are custom extensions and use disk backed caches. In fact, to support improved cleanup of persisted data, I had added #5287 . glad to know that lookups are getting more adoption :) |
…rCacheManager (apache#5344) * Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file. * Use StringUtils.format
|
@gianm while thinking more about it, Druid processes should have different snapshotDirectory configured so that there is never an actual conflict ..as two processes might have different lookup configuration and you wouldn't want one to override another. |
But this would defeat the purpose of the snapshot directory, since it means peons cannot start up properly if the coordinator is unavailable. What makes the lookup configuration different between different nodes? Is it just the tier? If so- maybe we should have one machine-wide lookup snapshot per tier? |
|
@gianm yes , they can be different for different lookup tier and "technically", different peons can have different lookup tier (e.g. overriding druid properties for peon via task json). so we either would eventually need to force all peons to be in same lookup tier, this is probably alright as I haven't seen any one exploiting the system to have different lookup tier for different peons or else each of them need their own snapshot directory. third option of course is to leave things in current state and wait for users hitting specific limitations and then evolve things based on that feedback :) |
Similar to apache#5344 but for the authorizer instead of the authenticator.
|
@himanshug raised a follow up with separate files for each lookup tier: #5358 |
…rCacheManager (apache#5344) * Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file. * Use StringUtils.format
Similar to #5344 but for the authorizer instead of the authenticator.
…rCacheManager (#5344) (#5360) * Fix races in LookupSnapshotTaker, CoordinatorPollingBasicAuthenticatorCacheManager. Both were susceptible to the following conditions: 1. Two JVMs on the same machine (perhaps two peons) could conflict by one reading while the other was writing, or by writing to the file at the same time. 2. One JVM could partially write a file, then crash, leaving a truncated file. * Use StringUtils.format
Similar to apache#5344 but for the authorizer instead of the authenticator.
…) (apache#5361) Similar to apache#5344 but for the authorizer instead of the authenticator.
crodier
left a comment
There was a problem hiding this comment.
I try to be more specific here ty vm again.
|
|
||
| private static OutputStream uncloseable(final OutputStream out) throws IOException | ||
| { | ||
| return new FilterOutputStream(out) |
There was a problem hiding this comment.
In unit tests locally with 15mb files, using a BufferedOutputStream here, takes 1 second vs. ~16 seconds. FilterOutputStream isn't buffered, is too slow for practical use?
Both were susceptible to the following conditions:
other was writing, or by writing to the file at the same time.