Skip to content

Conversation

@kangkaisen
Copy link
Contributor

Currently, the cardinality, avgRowSize, numNodes stat info in OlapScanNode is none, So the broadcastCost and partitionCost are both wrong and Doris couldn't auto choose a best join strategy.

So we should make the statistical information in OlapScanNode more precise.

@kangkaisen
Copy link
Contributor Author

Update this commit

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imay imay merged commit 1d8fc4b into apache:master Nov 7, 2018
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Jan 15, 2025
)

## Proposed changes
Hadoop snappycodec source :

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.cc
Example:
OriginData(The original data will be divided into several large data
block.) :
     large data block1 | large data block2 | large data block3 | ....
The large data block will be divided into several small data block.
Suppose a large data block is divided into three small blocks:
large data block1: | small block1 | small block2 | small block3 |
CompressData: <A [B1 compress(small block1) ] [B2 compress(small block1)
] [B3 compress(small block1)]>

A : original length of the current block of large data block.
sizeof(A) = 4 bytes.
A = length(small block1) + length(small block2) + length(small block3)
Bx : length of  small data block bx.
sizeof(Bx) = 4 bytes.
Bx = length(compress(small blockx))

Co-authored-by: Socrates <suxiaogang223@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants