Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Introduce auxiliary data structure for applying rules #222

@sezruby

Description

@sezruby

Feature requested

In order to optimize rule application process, we need to introduce some auxiliary data structure to share some information, so that we could avoid unnecessary recalculations and share some context to other parts of the rule application.

We could think about 2 types of data structure (i.e. cache):

  1. data structure cannot be shared for all rules; it's only available within a rule.
  2. data structure can be shared for all rules.

For 1),

  • cons: the tags in IndexLogEntry cannot be shared between Rules, recalculation problem still remains - e.g. signature calculation for each rules - Filter Rule and Join Rule (+ join rule v2).
  • pros: object management - candidate index entries are destructed after applying each rule, for each applicable plan. So no need to reset or manage complicated structure something like Map[ (index, relation), tags ].

For 2),

  • cons: hard to maintain the data structure; how/when/where to create/clear/access the data from each rule.
  • pros: no recalculation

I tried 2) a little, but maintaining a cache (of IndexOptimizerContext) for all rules is not that simple as each rule is called by Spark RuleExecutor. It's possible to use a wrapper rule applying all other rules in it, though rule-specific statistics by Spark RuleExecutor won't be collected properly. (ref)

Other options:

  • TreeNodeTag
    • only available from Spark3.0

Possible candidates for auxiliary data:

  • General) signature value for each candidate logical relation.
  • HybridScan) if the current index application requires Hybrid Scan or not.
  • HybridScan) common file size of index source files and current source files in a given relation.
    • can be used for rank algorithm
  • HybridScan) current files / appended files / deleted files
    .. etc

Acceptance criteria

Success criteria

Additional context

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions