-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1698: [JS] File reader attempts to load the same dictionary batch more than once #1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
js/src/reader/arrow.ts
Outdated
| let found: boolean = false; | ||
| for (let [id, vector] of readDictionaries(schema.fields(index), batch, state, dictionaries)) { | ||
| dictionaries[id] = dictionaries[id] && dictionaries[id].concat(vector) || vector; | ||
| found = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit I think we can break here, instead of using a flag? IIRC the readDictionaries function returns as soon as it finds a dictionary, or yields nothing if it doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but breaking here would only break out of the for, not the while, right?
it does seem like there should be a cleaner way, since readDictionaries will return at most one vector. Is it necessary to return the id? shouldn't it always equal batch.id?
Maybe readDictionaries could just return vector if found, otherwise null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit yeah, that's right. I think both the for loop over the readDictionaries iterator and the return [id, vector] tuple are remnants from when my understanding was that DictionaryBatches could contain more than one DictionaryVector.
Since that was a misunderstanding, it totally makes sense to change the readDictionaries signature to return DictionaryVector | void instead of Iterator<[id, vector]>, and not loop. Do you mind updating the PR to reflect this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
trxcllnt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use break instead of branching
trxcllnt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks @TheNeuralBit!
|
Is this caught by any tests? or will we be stuck until we can run JS in the integration test suite? |
|
@wesm I don't think so, but I can add the arrow file @TheNeuralBit uploaded to JIRA to the tests. That should hold us over until we can run the integration suite. |
|
No worries. I would say the sooner we can at least feed the integration test files to JS to say "can you read this?" the better |
|
+1 |
|
Sorry I should have asked you to rerun the build since Travis CI had some issues yesterday. master is currently broken |
No description provided.