[QTL] Make URI Exctraction Namespace take more sane arguments#2738
[QTL] Make URI Exctraction Namespace take more sane arguments#2738drcrallen merged 5 commits intoapache:masterfrom
Conversation
drcrallen
commented
Mar 25, 2016
- Fixes Better URI namespaced lookups behavior with single files #2669
|
This is incompatible but the static config is probably going away anyways. |
|
|
||
| final String priorCache = currentNamespaceCache.put(namespaceKey, swapCacheKey); | ||
| if (priorCache != null) { | ||
| // TODO: resolve what happens here if query is actively going on |
There was a problem hiding this comment.
please file a github issue for this if there is none already.
There was a problem hiding this comment.
@drcrallen @nishantmonu51 you could use a reference counting mechanism similar to segments to avoid closing the resources in query is in flight.
There was a problem hiding this comment.
Eventually, yes. Unless you have some sort of insight, though, this is not straight forward. The issue is that queries do not have a concept of resources associated with them. There are some runners which wrap resources, but there isn't a general concept of query resources that need to be accounted for.
There are quite a few resources that get tied to a particular query though, so it might be worthwhile to add such a convention.
|
👍 |
| |`pollPeriod`|Period between polling for updates|No|0 (only once)| | ||
| |`versionRegex`|Regex to help find newer versions of the namespace data|Yes|| | ||
| |`uri`|URI for the file of interest|No|Use `uriPrefix`| | ||
| |`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`| |
There was a problem hiding this comment.
can this read the most recent file under a prefix ? imagine i have an HDFS dir where prefix is /year/day/hour/lookupDir/
There was a problem hiding this comment.
Yes, that's part of io.druid.storage.hdfs.HdfsFileTimestampVersionFinder which is used to discriminate among multiple files matching the pattern in uriPrefix
There was a problem hiding this comment.
added some documentation
|
@b-slim this should be fixed now. Any other comments? |
| |`uri`|URI for the file of interest|No|Use `uriPrefix`| | ||
| |`uriPrefix`|A URI which specifies a directory (or other searchable resource) in which to search for files|No|Use `uri`| | ||
| |`fileRegex`|Optional regex for matching the file name under `uriPrefix`. Only used if `uriPrefix` is used|No|`".*"`| | ||
| |`namespaceParseSpec`|How to interpret the data at the URI|Yes|| |
There was a problem hiding this comment.
Out of scope here
|
@drcrallen please see the comments. My main concern i am not sure why this PR is not implementing the new interfaces ? |
|
@b-slim because the move to the new interface is in a single PR with no other modifications. I'm not going to modify a bunch of functionality in a PR as big as the lookup framework migration one. |
|
@drcrallen can we have an issue to track this by outlining the plan ? |
|
@b-slim I'm good for that. Give me 15 mins to get it written up |
|
@b-slim any more comments here? |
| One of either `uri` xor `uriPrefix` must be specified. | ||
|
|
||
| The `versionRegex` value specifies a regex to use to determine if a filename in the parent path of the uri should be considered when trying to find the latest version. Omitting this setting or setting it equal to `null` will match to all files it can find (equivalent to using `".*"`). The search occurs in the most significant "directory" of the uri. | ||
| The `pollPeriod` value specifies the period in ISO 8601 format between checks for updates. If the source of the lookup is capable of providing a timestamp, the lookup will only be updated if it has changed since the prior tick of `pollPeriod`. A value of 0, an absent parameter, or `null` all mean populate once and do not attempt to update. Whenever an update occurs, the updating system will look for a file with the most recent timestamp and assume that one with the most recent data. |
There was a problem hiding this comment.
reading this i can not tell if the update is incremental or is swapping the entire cache. Can we make it more clear please ?
There was a problem hiding this comment.
Explained more here, please look.
|
LGTM 👍 , i would recommend one more ut where we have new lookup file creation to test the get new version path, but is is not a blocker. it is up to the author to add it or not. (please squash) |
|
@b-slim IMPLs for io.druid.data.SearchableVersionedDataFinder do have tests for finding new versions. If there are things not covered by those tests, or if you're looking for more integrations with |
|
Thanks @b-slim ! |
|
@drcrallen thought this can go in one commit ? |
|
@drcrallen my bad but in the top of this pr i can see 5 commits and thought that's what is going to be in the master. |
| versionRegex = null; | ||
| } | ||
| } else { | ||
| final Path filePath = Paths.get(extractionNamespace.getUri()); |
There was a problem hiding this comment.
With an s3 lookup, this Paths.get call throws this exception for me,
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.nio.file.Paths.get(Paths.java:147) ~[?:1.8.0_66]
at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:154) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at io.druid.server.namespace.URIExtractionNamespaceFunctionFactory$3.call(URIExtractionNamespaceFunctionFactory.java:121) ~[druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at io.druid.server.namespace.cache.NamespaceExtractionCacheManager$5.run(NamespaceExtractionCacheManager.java:364) [druid-namespace-lookup-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_66]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]