[GLUTEN-8891][VL] Refine local ssd cache feature #9228
Conversation
08029ad to
d2ad5f8
Compare
There was a problem hiding this comment.
Copilot reviewed 3 out of 5 changed files in this pull request and generated 4 comments.
Files not reviewed (2)
- backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala: Language not supported
- backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxLocalCacheSuite.scala: Language not supported
| const std::string kVeloxSsdODirectEnabled = "spark.gluten.sql.columnar.backend.velox.ssdODirect"; | ||
| const std::string kVeloxSsdCheckpointIntervalBytes = | ||
| "spark.gluten.sql.columnar.backend.velox.ssdCheckpointIntervalBytes"; | ||
| const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; |
There was a problem hiding this comment.
The configuration key is declared as a bool but assigned a string literal. Consider changing the type to std::string for consistency with other configuration keys.
| const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; | |
| const std::string kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; |
| const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; | ||
| const bool kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled"; | ||
| const bool kVeloxSsdCheckSumReadVerificationEnabled = |
There was a problem hiding this comment.
The configuration key for checksum enabled is declared as a bool but assigned a string literal. It should be changed to std::string to align with typical configuration key types.
| const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; | |
| const bool kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled"; | |
| const bool kVeloxSsdCheckSumReadVerificationEnabled = | |
| const std::string kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow"; | |
| const std::string kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled"; | |
| const std::string kVeloxSsdCheckSumReadVerificationEnabled = |
| asyncDataCache_ = velox::cache::AsyncDataCache::create(cacheAllocator_.get()); | ||
| } else { | ||
| // TODO: this is not tracked by Spark. | ||
| auto ssd = InitSsdCache(ssdCacheSize); |
There was a problem hiding this comment.
The function call 'InitSsdCache' does not match the defined method 'initSsdCache'. Please update the call to use the correct case.
| auto ssd = InitSsdCache(ssdCacheSize); | |
| auto ssd = initSsdCache(ssdCacheSize); |
| int32_t ssdCacheShards = backendConf_->get<int32_t>(kVeloxSsdCacheShards, kVeloxSsdCacheShardsDefault); | ||
| int32_t ssdCacheIOThreads = backendConf_->get<int32_t>(kVeloxSsdCacheIOThreads, kVeloxSsdCacheIOThreadsDefault); | ||
| std::string ssdCachePathPrefix = backendConf_->get<std::string>(kVeloxSsdCachePath, kVeloxSsdCachePathDefault); | ||
| uint64_t ssdCheckpointIntervalSize = backendConf_->get<int32_t>(kVeloxSsdCheckpointIntervalBytes, 0); |
There was a problem hiding this comment.
The configuration retrieval uses get<int32_t> while the variable is declared as uint64_t. Consider using get<uint64_t> for consistency.
| uint64_t ssdCheckpointIntervalSize = backendConf_->get<int32_t>(kVeloxSsdCheckpointIntervalBytes, 0); | |
| uint64_t ssdCheckpointIntervalSize = backendConf_->get<uint64_t>(kVeloxSsdCheckpointIntervalBytes, 0); |
024b457 to
d646d2f
Compare
|
@zhli1142015 would you please help to take a look on this? Thanks, -yuan |
4332335 to
89b6d0c
Compare
This patch refine the local cache by adding more configurations from Velox. Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
89b6d0c to
49ed85e
Compare
What changes were proposed in this pull request?
This patch adds serveral new configurations for local SSD cache.
Also aded one basic parquet read test with local cache
How was this patch tested?
pass GHA