This repository was archived by the owner on Jun 14, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 116
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
Introduce auxiliary data structure for applying rules #222
Copy link
Copy link
Closed
Description
Feature requested
In order to optimize rule application process, we need to introduce some auxiliary data structure to share some information, so that we could avoid unnecessary recalculations and share some context to other parts of the rule application.
We could think about 2 types of data structure (i.e. cache):
- data structure cannot be shared for all rules; it's only available within a rule.
- data structure can be shared for all rules.
For 1),
- cons: the tags in IndexLogEntry cannot be shared between Rules, recalculation problem still remains - e.g. signature calculation for each rules - Filter Rule and Join Rule (+ join rule v2).
- pros: object management - candidate index entries are destructed after applying each rule, for each applicable plan. So no need to reset or manage complicated structure something like Map[ (index, relation), tags ].
For 2),
- cons: hard to maintain the data structure; how/when/where to create/clear/access the data from each rule.
- pros: no recalculation
I tried 2) a little, but maintaining a cache (of IndexOptimizerContext) for all rules is not that simple as each rule is called by Spark RuleExecutor. It's possible to use a wrapper rule applying all other rules in it, though rule-specific statistics by Spark RuleExecutor won't be collected properly. (ref)
Other options:
- TreeNodeTag
- only available from Spark3.0
Possible candidates for auxiliary data:
- General) signature value for each candidate logical relation.
- HybridScan) if the current index application requires Hybrid Scan or not.
- HybridScan) common file size of index source files and current source files in a given relation.
- can be used for rank algorithm
- HybridScan) current files / appended files / deleted files
.. etc
Acceptance criteria
Success criteria
Additional context
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request