Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions protos/join_key.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The Lance Authors

syntax = "proto3";

package lance.table;

// Value of the join key representation (reserved for future use)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused, should just remove

// Currently filters operate on hashed values for compactness.
message JoinKeyValue {
oneof value {
string string_value = 1;
int64 int64_value = 2;
uint64 uint64_value = 3;
bytes binary_value = 4;
CompositeKey composite = 5;
}
}

message CompositeKey {
repeated JoinKeyValue parts = 1;
}

// Exact set of join keys, represented by their 64-bit hashes.
message ExactSet {
repeated uint64 key_hashes = 1;
}

// Bloom filter data for join key set membership tests.
message BloomFilterData {
// Bitset backing the bloom filter.
bytes bitmap = 1;
// Number of hash functions used.
uint32 num_hashes = 2;
// Total number of bits in the bitmap.
uint32 bitmap_bits = 3;
// Reserved for future fields to avoid reuse.
reserved 4, 5;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have all these reserved fields? I think they are not really needed until we want to add them in the future?

reserved "hash_seed", "hash_algo";
}

// Join key metadata attached to a Transaction for conflict detection.
message JoinKeyMetadata {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name does not feel right to me. Consider we use this for an INSERT, then there is not really a join key. What about just keyMetadata or something like KeySet or RowKeySet?

// Names of columns participating in the join key.
repeated string columns = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be column field ID

oneof filter {
ExactSet exact_set = 2;
BloomFilterData bloom = 3;
}
// Reserved to allow schema evolution.
reserved 4, 5;
}
4 changes: 4 additions & 0 deletions protos/transaction.proto
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ syntax = "proto3";

import "file.proto";
import "table.proto";
import "join_key.proto";
import "google/protobuf/any.proto";

package lance.table;
Expand Down Expand Up @@ -33,6 +34,8 @@ message Transaction {
// __lance_commit_message is a reserved key
map<string, string> transaction_properties = 4;

// Join key metadata using typed protobuf message. This is the sole carrier.
optional JoinKeyMetadata join_key_metadata = 6;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be in the transaction itself, it should be unique to only a few write transactions to do detection for operations like insert or merge_insert that could add new data to the table?

// Add new rows to the dataset.
message Append {
// The new fragments to append.
Expand Down Expand Up @@ -299,6 +302,7 @@ message Transaction {
}

// Fields 200/202 (`blob_append` / `blob_overwrite`) previously represented blob dataset ops.
reserved 5;
reserved 200, 202;
reserved "blob_append", "blob_overwrite";
}
1 change: 1 addition & 0 deletions python/src/transaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,7 @@ impl FromPyObject<'_> for PyLance<Transaction> {
operation,
tag: None,
transaction_properties,
join_key_metadata: None,
}))
}
}
Expand Down
2 changes: 1 addition & 1 deletion rust/lance-index/src/scalar/bloomfilter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ use crate::scalar::{
use crate::{pb, Any};
use arrow_array::{Array, UInt64Array};
mod as_bytes;
mod sbbf;
pub mod sbbf;
use arrow_schema::{DataType, Field};
use serde::{Deserialize, Serialize};

Expand Down
1 change: 1 addition & 0 deletions rust/lance-table/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ fn main() -> Result<()> {
"./protos/table.proto",
"./protos/transaction.proto",
"./protos/rowids.proto",
"./protos/join_key.proto",
],
&["./protos"],
)?;
Expand Down
1 change: 1 addition & 0 deletions rust/lance/src/dataset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ pub(crate) mod blob;
mod branch_location;
pub mod builder;
pub mod cleanup;
pub mod conflict_detection;
pub mod delta;
pub mod fragment;
mod hash_joiner;
Expand Down
Loading
Loading