Skip to content

Conversation

@eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Jun 10, 2025

What problem does this PR solve?

  1. Introduced variant sparse columns
  2. Implement Variant with schema template feature
  3. Implement multi inverted index feature

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jun 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon eldenmoon marked this pull request as draft June 10, 2025 16:28
@eldenmoon eldenmoon force-pushed the variant-sparse-merge branch 2 times, most recently from b0f6c1e to ac71c3d Compare June 10, 2025 16:33
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon eldenmoon force-pushed the variant-sparse-merge branch from 84ceebf to 808f525 Compare June 11, 2025 11:07
@eldenmoon
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.32% (1119/1343)
Line Coverage 66.77% (19184/28732)
Region Coverage 66.45% (9504/14302)
Branch Coverage 56.35% (5148/9136)

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.32% (1119/1343)
Line Coverage 66.81% (19196/28732)
Region Coverage 66.44% (9502/14302)
Branch Coverage 56.41% (5154/9136)

@eldenmoon
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.32% (1119/1343)
Line Coverage 66.80% (19192/28732)
Region Coverage 66.49% (9509/14302)
Branch Coverage 56.40% (5153/9136)

@eldenmoon
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.33% (1120/1344)
Line Coverage 66.85% (19325/28909)
Region Coverage 66.60% (9584/14390)
Branch Coverage 56.52% (5209/9216)

@eldenmoon eldenmoon changed the title Variant sparse merge [Feature](Variant) support variant sparse feature and schema template with multi indexes Jun 25, 2025
@eldenmoon eldenmoon marked this pull request as ready for review June 26, 2025 03:42
morrySnow pushed a commit that referenced this pull request Jun 26, 2025
@eldenmoon eldenmoon force-pushed the variant-sparse-merge branch 3 times, most recently from 852cdbc to c46cc63 Compare July 23, 2025 12:20
@eldenmoon eldenmoon force-pushed the variant-sparse-merge branch from 953d0c0 to 2a7fc31 Compare July 23, 2025 15:19
@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.42% (1302/1619)
Line Coverage 65.77% (21797/33140)
Region Coverage 67.04% (10946/16328)
Branch Coverage 56.59% (5759/10176)

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

*/

const int BeExecVersionManager::max_be_exec_version = 8;
const int BeExecVersionManager::max_be_exec_version = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没必要有这个,直接做不兼容的改动就好了,如果有这个,我们说明有一些遗留的代码,是可以删除的

add_mapping<FieldType::OLAP_FIELD_TYPE_DECIMAL256>();
add_mapping<FieldType::OLAP_FIELD_TYPE_IPV4>();
add_mapping<FieldType::OLAP_FIELD_TYPE_IPV6>();
add_mapping<FieldType::OLAP_FIELD_TYPE_FLOAT>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要改这个?

if (!st.ok()) {
return st;
}
auto index_metes = _rowset_meta->tablet_schema()->inverted_indexs(*column);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo error?

auto status = process_files(*index_meta, indices, index);
if (!status.ok()) {
return status;
auto status = process_files(*index_meta, indices, index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return if error

_rowset_meta->set_tablet_uid(_context.tablet_uid);
_rowset_meta->set_tablet_schema(_context.tablet_schema);
auto schema = _context.tablet_schema->need_record_variant_extended_schema()
? _context.tablet_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些地方得有一些注释


bool SegmentFlusher::need_buffering() {
// buffering variants for schema change
return _context.write_type == DataWriteType::TYPE_SCHEMA_CHANGE &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variant 支持啥schema change啊?

//bool has_null = true;
//RETURN_IF_ERROR(index_file_reader->has_null(index_meta, &has_null));
//_inverted_index->set_has_null(has_null);
_index_readers[index_meta->index_id()] = index_reader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们当前是每一个column reader 里缓存这个column 相关的所有的index readers 吗?

opts, column, std::unique_ptr<Field>(FieldFactory::create(*column))));
return Status::OK();
}
*writer = std::unique_ptr<ColumnWriter>(new VariantColumnWriter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_unique

// Use ScalarColumnWriter to write it's only root data
std::unique_ptr<ColumnWriter> writer_local = std::unique_ptr<ColumnWriter>(
new ScalarColumnWriter(opts, std::move(field), file_writer));
*writer = std::move(writer_local);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_unique

// Use ScalarColumnWriter to write it's only root data
std::unique_ptr<ColumnWriter> writer_local = std::unique_ptr<ColumnWriter>(
new ScalarColumnWriter(opts, std::move(field), file_writer));
*writer = std::move(writer_local);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::make_unique

const std::vector<TabletColumnPtr>& sparse_columns() const;
size_t num_sparse_columns() const { return _num_sparse_columns; }

void set_precision_frac(int32_t precision, int32_t frac) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么会有set 方法?这个不应该是从元数据中读取出来是啥就是啥吗?

const std::vector<TabletColumnPtr>& sparse_columns() const;
size_t num_sparse_columns() const { return _num_sparse_columns; }

void set_precision_frac(int32_t precision, int32_t frac) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么会有set 方法?这个不应该是从元数据中读取出来是啥就是啥吗?

vectorized::PathInData path;
vectorized::ColumnPtr column;
vectorized::DataTypePtr type;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有的新文件,默认都要加上编译检查

return Status::OK();
}

void set_variant_max_subcolumns_count(int32_t variant_max_subcolumns_count) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要提供set 方法

*/
#define writeCString(s, buf) (buf).write((s), strlen(s))

inline void writeJSONString(const char* begin, const char* end, BufferWritable& buf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法应该是bufferwriteable的成员函数

}

/** Writes a C-string without creating a temporary object. If the string is a literal, then `strlen` is executed at the compilation stage.
* Use when the string is a literal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个代码需要单独看看!!

return col_res;
} else {
static_assert(false);
// static_assert(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥注释了?

@@ -66,6 +66,7 @@ message PColumnMeta {
optional string function_name = 7;
optional int32 be_exec_version = 8;
optional segment_v2.ColumnPathInfo column_path = 9;
optional int32 variant_max_subcolumns_count = 10 [default = 0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default 0 是啥含义?

// this field is only used during flexible partial update load
optional bool is_on_update_current_timestamp = 25 [default = false];
optional bool is_on_update_current_timestamp = 28 [default = false];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这改一下顺序,是不是启动的时候存储就错了啊?

@eldenmoon eldenmoon requested a review from w41ter as a code owner July 29, 2025 04:03
@eldenmoon eldenmoon force-pushed the variant-sparse-merge branch from e53d779 to c268637 Compare July 29, 2025 06:38
@eldenmoon eldenmoon closed this Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants