Partition filter aware index application

**Feature requested**

Currently Hyperspace considers all source files in the given relation.

However, if we refers the partition filter for the partitioned source data, we could exclude unrelated file paths.
For example, 
All file paths = `["/path/col=1/a.parquet", "/path/col=2/b.parquet", "/path/col=1/b.parquet"]`
Paths with partition filter (col = 1) =  `["/path/col=1/a.parquet", "/path/col=1/b.parquet"]`

With the filtered paths, we could apply the indexes which were "partially" refreshed.  (#298)
For the partially refreshed indexes, we could apply hybrid scan in a more efficient way using the filtered indexes.

For example, a user may want to run a query with partition filter `col=1` and `col=2`,
Index source files =  `["/path/col=1/a.parquet", "/path/col=1/b.parquet"]`
and there are many appended files under the source relation after the index creation (e.g. `/path/col=2/*`, `/path/col=3/*`, `/path/col=4/*` ..)

- without partial refresh feature & without using partition filter
  - fully refreshed index
  - do Hybrid Scan with many deleted files
- with partial refresh feature & without using partition filter  
  - partial refreshed index
  - do Hybrid Scan with many appended files (unrelated file paths are not required, but will be handled as appended files)
- with partial refresh feature & with using partition filter 
  - partial refreshed index
  - do Hybrid Scan with less diff files, or even we don't need Hybrid Scan if source files in the given query is the same as the source file list of partially refreshed index.


**Acceptance criteria** 
tbd

**Success criteria**
tbd

**Additional context**
+ If the index is bucketed with the partitioned column, we also could remove the unnecessary buckets(=index data files) to read, based on the partition filter.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition filter aware index application #338

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Partition filter aware index application #338

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions