From 3880ad7f051b7d4a38bf10dfb9a0da97559ce607 Mon Sep 17 00:00:00 2001 From: "f.zheng" Date: Mon, 29 Oct 2018 17:33:54 -0700 Subject: [PATCH 1/2] ORC-426: Fix errors in ORC specification. --- site/specification/ORCv0.md | 2 +- site/specification/ORCv1.md | 4 ++-- site/specification/ORCv2.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/site/specification/ORCv0.md b/site/specification/ORCv0.md index 32ce14a151..b4fea4e81b 100644 --- a/site/specification/ORCv0.md +++ b/site/specification/ORCv0.md @@ -725,7 +725,7 @@ DIRECT | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md index fb90c8353c..a1bd0aeb90 100644 --- a/site/specification/ORCv1.md +++ b/site/specification/ORCv1.md @@ -899,7 +899,7 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents @@ -978,7 +978,7 @@ group (default to 10,000 rows) in a column. Only the row groups that satisfy min/max row index evaluation will be evaluated against the bloom filter index. -Each BloomFilterEntry stores the number of hash functions ('k') used +Each bloom filter entry stores the number of hash functions ('k') used and the bitset backing the bloom filter. The original encoding (pre ORC-101) of bloom filters used the bitset field encoded as a repeating sequence of longs in the bitset field with a little endian encoding diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md index 76ee571f0e..030c9f66b9 100644 --- a/site/specification/ORCv2.md +++ b/site/specification/ORCv2.md @@ -916,7 +916,7 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents @@ -995,7 +995,7 @@ group (default to 10,000 rows) in a column. Only the row groups that satisfy min/max row index evaluation will be evaluated against the bloom filter index. -Each BloomFilterEntry stores the number of hash functions ('k') used +Each bloom filter entry stores the number of hash functions ('k') used and the bitset backing the bloom filter. The original encoding (pre ORC-101) of bloom filters used the bitset field encoded as a repeating sequence of longs in the bitset field with a little endian encoding From 695c927a617948750bdfcd40c818cf4e17837e51 Mon Sep 17 00:00:00 2001 From: "f.zheng" Date: Mon, 29 Oct 2018 17:40:37 -0700 Subject: [PATCH 2/2] ORC-426: Remove redundant sentence in ORC specification. --- site/specification/ORCv1.md | 2 -- site/specification/ORCv2.md | 2 -- 2 files changed, 4 deletions(-) diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md index a1bd0aeb90..5dbd3d027f 100644 --- a/site/specification/ORCv1.md +++ b/site/specification/ORCv1.md @@ -581,8 +581,6 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Data values (W * L bits padded to the byte) - A sequence of W bit positive - values that are added to the base value. * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md index 030c9f66b9..d91139c0fe 100644 --- a/site/specification/ORCv2.md +++ b/site/specification/ORCv2.md @@ -601,8 +601,6 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Data values (W * L bits padded to the byte) - A sequence of W bit positive - values that are added to the base value. * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous