-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](iceberg) Add manifest-level cache for Iceberg tables to reduce I/O and parsing overhead #59056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 36570 ms |
TPC-DS: Total hot run time: 180705 ms |
ClickBench: Total hot run time: 27.21 s |
|
run buildall |
TPC-H: Total hot run time: 36120 ms |
TPC-DS: Total hot run time: 178998 ms |
ClickBench: Total hot run time: 27.32 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 35151 ms |
|
run buildall |
TPC-H: Total hot run time: 35385 ms |
TPC-DS: Total hot run time: 178544 ms |
ClickBench: Total hot run time: 27.24 s |
FE Regression Coverage ReportIncrement line coverage |
5c598d1 to
53318e7
Compare
|
run buildall |
TPC-H: Total hot run time: 34962 ms |
TPC-DS: Total hot run time: 178232 ms |
ClickBench: Total hot run time: 27.41 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 35242 ms |
TPC-DS: Total hot run time: 178687 ms |
|
run external |
… remove obsolete explain test
|
run buildall |
TPC-H: Total hot run time: 36088 ms |
TPC-DS: Total hot run time: 179920 ms |
ClickBench: Total hot run time: 27.17 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
kaka11chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…uce I/O and parsing overhead (#59056) ### What problem does this PR solve? ## Motivation During Iceberg query planning, FE needs to read and parse the metadata chain: ManifestList → Manifest → DataFile/DeleteFile. When frequently querying hot partitions or executing small batch queries, the same Manifest files are repeatedly read and parsed, causing significant I/O and CPU overhead. ## Solution This PR introduces a manifest-level cache (`IcebergManifestCache`) in FE to cache the parsed DataFile/DeleteFile lists per manifest file. The cache is implemented using Caffeine with weight-based LRU eviction and TTL support. ### Key Components - **IcebergManifestCache**: Core cache implementation using Caffeine - Weight-based LRU eviction controlled by `iceberg.manifest.cache.capacity-mb` - TTL expiration via `iceberg.manifest.cache.ttl-second` - Single-flight loading to prevent duplicate parsing of the same manifest - **ManifestCacheKey**: Cache key consisting of: - Manifest file path - **ManifestCacheValue**: Cached payload containing: - List of `DataFile` or `DeleteFile` - Estimated memory weight for eviction - **IcebergManifestCacheLoader**: Helper class to load and populate the cache using `ManifestFiles.read()` ### Cache Invalidation Strategy - Key changes automatically invalidate stale entries (length/lastModified/sequenceNumber changes) - TTL prevents stale data when underlying storage doesn't support precise mtime/etag - Different snapshots use different manifest paths/keys, ensuring snapshot-level isolation ### Iceberg Catalog Properties | Config | Default | Description | |--------|---------|-------------| | `iceberg.manifest.cache.enable` | `true` | Enable/disable manifest cache | | `iceberg.manifest.cache.capacity-mb` | `1024` | Maximum cache capacity in MB | | `iceberg.manifest.cache.ttl-second` | `48 * 60 * 60` | Cache entry expiration after access | ### Integration Point The cache is integrated in `IcebergScanNode.planFileScanTaskWithManifestCache()`, which: 1. Loads delete manifests via cache and builds `DeleteFileIndex` 2. Loads data manifests via cache and creates `FileScanTask` for each data file 3. Falls back to original scan if cache loading fails
What problem does this PR solve?
Motivation
During Iceberg query planning, FE needs to read and parse the metadata chain: ManifestList → Manifest → DataFile/DeleteFile. When frequently querying hot partitions or executing small batch queries, the same Manifest files are repeatedly read and parsed, causing significant I/O and CPU overhead.
Solution
This PR introduces a manifest-level cache (
IcebergManifestCache) in FE to cache the parsed DataFile/DeleteFile lists per manifest file. The cache is implemented using Caffeine with weight-based LRU eviction and TTL support.Key Components
IcebergManifestCache: Core cache implementation using Caffeine
iceberg.manifest.cache.capacity-mbiceberg.manifest.cache.ttl-secondManifestCacheKey: Cache key consisting of:
ManifestCacheValue: Cached payload containing:
DataFileorDeleteFileIcebergManifestCacheLoader: Helper class to load and populate the cache using
ManifestFiles.read()Cache Invalidation Strategy
Iceberg Catalog Properties
iceberg.manifest.cache.enabletrueiceberg.manifest.cache.capacity-mb1024iceberg.manifest.cache.ttl-second48 * 60 * 60Integration Point
The cache is integrated in
IcebergScanNode.planFileScanTaskWithManifestCache(), which:DeleteFileIndexFileScanTaskfor each data fileRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)