Skip to content

[Vectorization] Batch sizing#344

Merged
danielcweeks merged 20 commits intoapache:vectorized-readfrom
prodeezy:vectorized-batch-sizing
Aug 1, 2019
Merged

[Vectorization] Batch sizing#344
danielcweeks merged 20 commits intoapache:vectorized-readfrom
prodeezy:vectorized-batch-sizing

Conversation

@prodeezy
Copy link
Copy Markdown
Contributor

@prodeezy prodeezy commented Aug 1, 2019

Addresses #312

  • Rebased with apache/master
  • Added batch sizing

/cc @samarthjain @danielcweeks @anjalinorwood

File size: 64 MB
Num Files: 5
Num rows per file: 10M

Benchmark                                                              Mode  Cnt   Score   Error  Units
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceNonVectorized    ss    5  59.789 ± 1.731   s/op
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceVectorized       ss    5  12.643 ± 0.772   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized100k      ss    5  47.149 ± 0.671   s/op

File size : 175MB
Num files : 2
Num rows per file: 20M

Benchmark                                                              Mode  Cnt   Score   Error  Units
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceNonVectorized    ss    5  67.221 ± 0.885   s/op
IcebergSourceFlatParquetDataReadBenchmark.readFileSourceVectorized       ss    5  25.763 ± 1.112   s/op
IcebergSourceFlatParquetDataReadBenchmark.readIcebergVectorized100k      ss    5  63.633 ± 0.637   s/op

chenjunjiedada and others added 20 commits July 25, 2019 14:25
The Gradle install target produces invalid POM files that are missing
the dependencyManagement section and versions for some dependencies.
Instead, we directly tell JitPack to run the correct Gradle target.
This truncates by unicode codepoint instead of Java chars.
Fixes apache#328, fixes apache#329.

Index to codePointAt should be the offset calculated by code points
@prodeezy
Copy link
Copy Markdown
Contributor Author

prodeezy commented Aug 1, 2019

@danielcweeks I'v piggybacked the master rebase within this PR. if you feel that should be a separate PR i can do that.

@danielcweeks danielcweeks merged commit 189ab45 into apache:vectorized-read Aug 1, 2019
RussellSpitzer pushed a commit to RussellSpitzer/iceberg that referenced this pull request Oct 29, 2021
…K-34720 (apache#344)

This spans rdar://80592514 and rdar://79757416

* Add in a test which only passes after explicit representation of InsertStarAction and UpdateStarAction is included, e.g. that queries that do not reference any specific columns or any otherwise unresolved expressions will generate full query plans.

* Add a test explcitly for boolean literal in the predicate

* Add test for resolve columns by name (instead of by ordinal position) in MERGE INTO queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants