Skip to content

Conversation

@kangkaisen
Copy link
Contributor

For #1610 "avoid serialize and deserialize from Scannode to Aggnode".

For aggregation query, we directly pass agg object from scan node to Agg node and avoid unnecessary serialize and deserialize.

auto* hll = new HyperLogLog((const uint8_t*) src_slice->data);
dst_slice->data = reinterpret_cast<char*>(hll);

arena->track_memory(sizeof(HyperLogLog));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sizeof(HyperLogLog) does not reflect the real size of HLL, and even has great difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.
But this should be another issue, This PR don't change this point.
We need a better way to estimate the HLL and Bitmap memory usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

std::swap(_has_in_flight_row, other->_has_in_flight_row);
std::swap(_num_rows, other->_num_rows);
std::swap(_capacity, other->_capacity);
_agg_object_pool.swap(other->_agg_object_pool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This swap function seems not be used, you can delete this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

void RowBatch::transfer_resource_ownership(RowBatch* dest) {
dest->_auxiliary_mem_usage += _tuple_data_pool->total_allocated_bytes();
dest->_tuple_data_pool->acquire_data(_tuple_data_pool.get(), false);
dest->_agg_object_pool.swap(_agg_object_pool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics is a transfer, which is not a swap logic.
If dest is empty this is right, if not this will be wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has fixed

find_with_default(params.per_node_scan_ranges, scan_node->id(), no_scan_ranges);
scan_node->set_scan_ranges(scan_ranges);
VLOG(1) << "scan_node_Id=" << scan_node->id() << " size=" << scan_ranges.size();
scan_node->set_is_agg_node_child(has_agg_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not works for SCAN1 if the plan tree looks like

    JOIN
        |
     ----------
    |         |
SCAN1         AGG
              |
            SCAN2

There are other problems.
For example, if there are some HLL operation happen between agg and scan, it may cause problem.
Why not do like this?

  1. ScanNode alway return object representation
  2. All Functions or type handler can handle two representation.

If we do like that, we will not depend how plan tree is like.

row_cursor.allocate_memory_for_string_type(_tablet->tablet_schema());
while (_heap.size() > 0) {
init_row_with_others(&row_cursor, *(_heap.top().row_cursor), arena.get());
init_row_with_others(&row_cursor, *(_heap.top().row_cursor), arena.get(), agg_object_pool.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset object pool like arena?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

uint32_t row_checksum = 0;
while (true) {
OLAPStatus res = reader.next_row_with_aggregation(&row, arena.get(), &eof);
OLAPStatus res = reader.next_row_with_aggregation(&row, arena.get(), agg_object_pool.get(), &eof);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset object pool like arena

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

size_t _variable_len;

// for agg query, we don't need to finalize when scan agg object data
bool need_agg_finalize = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good option to put this flag here? This struct only a row, it should not care how to assemble itself

_merged_rows += merged_count;
agg_finalize_row(_value_cids, row_cursor, arena);
// For agg query, we don't need finalize agg object and directly pass agg object to agg node
if (row_cursor->is_need_agg_finalize()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to put flag in ReaderParams

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

int64_t output_rows = 0;
while (true) {
Arena arena;
ObjectPool objectPool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's bad. we will allocate a new arena and object pool for each row we read. This is very bad for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this issue in another PR.

@imay
Copy link
Contributor

imay commented Sep 25, 2019

@kangkaisen

I suddenly remembered that you need to modify HLL related functions in be/src/exprs/aggregate_functions.cpp. Otherwise, it will cause BE Crash during the upgrade process.

@kangkaisen
Copy link
Contributor Author

@kangkaisen

I suddenly remembered that you need to modify HLL related functions in be/src/exprs/aggregate_functions.cpp. Otherwise, it will cause BE Crash during the upgrade process.

Thanks for your reminder. update.

// zero size means the src input is a HyperLogLog object
if (other.len == 0) {
auto* hll = reinterpret_cast<doris::HyperLogLog*>(other.ptr);
uint8_t* ptr = ctx->allocate(doris::HLL_COLUMN_DEFAULT_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is a private member variable named ptr and len, better to rename this ptr to other_ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit b246d93 into apache:master Sep 26, 2019
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants