Skip to content

[GLUTEN-8891][VL] Refine local ssd cache feature #9228

Merged
zhouyuan merged 5 commits intoapache:mainfrom
zhouyuan:wip_local_cache_keep3
Apr 10, 2025
Merged

[GLUTEN-8891][VL] Refine local ssd cache feature #9228
zhouyuan merged 5 commits intoapache:mainfrom
zhouyuan:wip_local_cache_keep3

Conversation

@zhouyuan
Copy link
Copy Markdown
Member

@zhouyuan zhouyuan commented Apr 4, 2025

What changes were proposed in this pull request?

This patch adds serveral new configurations for local SSD cache.
Also aded one basic parquet read test with local cache

How was this patch tested?

pass GHA

@github-actions github-actions bot added the VELOX label Apr 4, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2025

#8891

@zhouyuan zhouyuan force-pushed the wip_local_cache_keep3 branch from 08029ad to d2ad5f8 Compare April 4, 2025 13:45
@zhouyuan zhouyuan requested a review from Copilot April 4, 2025 13:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 5 changed files in this pull request and generated 4 comments.

Files not reviewed (2)
  • backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala: Language not supported
  • backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxLocalCacheSuite.scala: Language not supported

Comment thread cpp/velox/config/VeloxConfig.h Outdated
const std::string kVeloxSsdODirectEnabled = "spark.gluten.sql.columnar.backend.velox.ssdODirect";
const std::string kVeloxSsdCheckpointIntervalBytes =
"spark.gluten.sql.columnar.backend.velox.ssdCheckpointIntervalBytes";
const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";
Copy link

Copilot AI Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration key is declared as a bool but assigned a string literal. Consider changing the type to std::string for consistency with other configuration keys.

Suggested change
const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";
const std::string kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";

Copilot uses AI. Check for mistakes.
Comment thread cpp/velox/config/VeloxConfig.h Outdated
Comment on lines +107 to +109
const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";
const bool kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled";
const bool kVeloxSsdCheckSumReadVerificationEnabled =
Copy link

Copilot AI Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration key for checksum enabled is declared as a bool but assigned a string literal. It should be changed to std::string to align with typical configuration key types.

Suggested change
const bool kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";
const bool kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled";
const bool kVeloxSsdCheckSumReadVerificationEnabled =
const std::string kVeloxSsdDisableFileCow = "spark.gluten.sql.columnar.backend.velox.ssdDisableFileCow";
const std::string kVeloxSsdCheckSumEnabled = "spark.gluten.sql.columnar.backend.velox.ssdChecksumEnabled";
const std::string kVeloxSsdCheckSumReadVerificationEnabled =

Copilot uses AI. Check for mistakes.
Comment thread cpp/velox/compute/VeloxBackend.cc Outdated
asyncDataCache_ = velox::cache::AsyncDataCache::create(cacheAllocator_.get());
} else {
// TODO: this is not tracked by Spark.
auto ssd = InitSsdCache(ssdCacheSize);
Copy link

Copilot AI Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call 'InitSsdCache' does not match the defined method 'initSsdCache'. Please update the call to use the correct case.

Suggested change
auto ssd = InitSsdCache(ssdCacheSize);
auto ssd = initSsdCache(ssdCacheSize);

Copilot uses AI. Check for mistakes.
Comment thread cpp/velox/compute/VeloxBackend.cc Outdated
int32_t ssdCacheShards = backendConf_->get<int32_t>(kVeloxSsdCacheShards, kVeloxSsdCacheShardsDefault);
int32_t ssdCacheIOThreads = backendConf_->get<int32_t>(kVeloxSsdCacheIOThreads, kVeloxSsdCacheIOThreadsDefault);
std::string ssdCachePathPrefix = backendConf_->get<std::string>(kVeloxSsdCachePath, kVeloxSsdCachePathDefault);
uint64_t ssdCheckpointIntervalSize = backendConf_->get<int32_t>(kVeloxSsdCheckpointIntervalBytes, 0);
Copy link

Copilot AI Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration retrieval uses get<int32_t> while the variable is declared as uint64_t. Consider using get<uint64_t> for consistency.

Suggested change
uint64_t ssdCheckpointIntervalSize = backendConf_->get<int32_t>(kVeloxSsdCheckpointIntervalBytes, 0);
uint64_t ssdCheckpointIntervalSize = backendConf_->get<uint64_t>(kVeloxSsdCheckpointIntervalBytes, 0);

Copilot uses AI. Check for mistakes.
@zhouyuan zhouyuan force-pushed the wip_local_cache_keep3 branch from 024b457 to d646d2f Compare April 4, 2025 14:01
@zhouyuan
Copy link
Copy Markdown
Member Author

zhouyuan commented Apr 8, 2025

@zhli1142015 would you please help to take a look on this? Thanks, -yuan

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhouyuan zhouyuan force-pushed the wip_local_cache_keep3 branch 2 times, most recently from 4332335 to 89b6d0c Compare April 8, 2025 17:18
zhouyuan added 5 commits April 9, 2025 08:50
This patch refine the local cache by adding more configurations from Velox.

Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
@zhouyuan zhouyuan force-pushed the wip_local_cache_keep3 branch from 89b6d0c to 49ed85e Compare April 9, 2025 07:50
@zhouyuan zhouyuan merged commit 9e33a5f into apache:main Apr 10, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants