Describe the bug
Java Arrow provides Data.importVector API which is used to import Arrow array/schema through C Data interface. The caller needs to provide dictionary provider to the API and Data.importVector fills dictionary value vectors into the provider with dictionary ids are keys.
The dictionary ids are keys used to look up correct dictionary values for dictionary-encoded arrays. So it is obvious that dictionary id should be unique. Otherwise, dictionary arrays will use incorrect dictionary values and cause issues.
In Java Arrow API. one class SchemaImport is used to maintain current dictionary id during importing an array. But one design drawback of this Data.importVector API is, this SchemaImport is initiated internally in Data.importVector. So the uniqueness is only guaranteed for the array. For example, if the array is nested type, all dictionary-encoded arrays in the array will have unique dictionary ids.
But once you import another array by calling Data.importVector again, the dictionary id is reset. It cannot provide unique dictionary ids for all arrays you are importing.
Due to the above issue, CometReader.loadVector which calls Data.importVector to import arrays from native code, will overwrite dictionary ids of other arrays.
This is the cause of test failures of CometTPCDSQuerySuite in #437.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
Java Arrow provides
Data.importVectorAPI which is used to import Arrow array/schema through C Data interface. The caller needs to provide dictionary provider to the API andData.importVectorfills dictionary value vectors into the provider with dictionary ids are keys.The dictionary ids are keys used to look up correct dictionary values for dictionary-encoded arrays. So it is obvious that dictionary id should be unique. Otherwise, dictionary arrays will use incorrect dictionary values and cause issues.
In Java Arrow API. one class
SchemaImportis used to maintain current dictionary id during importing an array. But one design drawback of thisData.importVectorAPI is, thisSchemaImportis initiated internally inData.importVector. So the uniqueness is only guaranteed for the array. For example, if the array is nested type, all dictionary-encoded arrays in the array will have unique dictionary ids.But once you import another array by calling
Data.importVectoragain, the dictionary id is reset. It cannot provide unique dictionary ids for all arrays you are importing.Due to the above issue,
CometReader.loadVectorwhich callsData.importVectorto import arrays from native code, will overwrite dictionary ids of other arrays.This is the cause of test failures of
CometTPCDSQuerySuitein #437.Steps to reproduce
No response
Expected behavior
No response
Additional context
No response