[Performance] Use SMID instruction to speed up the page decode of PlainPage and DictPage 

## Motivation
Now, When we run ssb test for doris. See the CPU Perf find：
![image](https://user-images.githubusercontent.com/10553413/123204245-429db680-d4ea-11eb-884f-3555891712c9.png)

There is plenty of CPU compute in `page decode of PlainPage and DictPage`

try to see the detail, we find there are many of mem allocate in dispose the `BitUtil::RoundUpToPowerOf2`
![image](https://user-images.githubusercontent.com/10553413/123204430-afb14c00-d4ea-11eb-9f9e-7bf26cec2c7f.png)

## Implementation

Obvious, we can use the SMID to speed up the function `BitUtil::RoundUpToPowerOf2`

After use SSE to speed up the function, the perf show CPU cost:

![image](https://user-images.githubusercontent.com/10553413/123204655-21899580-d4eb-11eb-8295-01b378ec0d85.png)


|   |  no vectorized  | vectorized |
|  ---- |  ----  | ----  |
| DictPage| 23.42% |  14.82% |
| PlainPage| 23.38%  |  11.93% |


### 3. More Test In SSB

![image](https://user-images.githubusercontent.com/10553413/123205063-d0c66c80-d4eb-11eb-91a1-1f9b4a4825dd.png)

We can find q4，q5，q6，q8，q9，q11 improve about 20%
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Use SMID instruction to speed up the page decode of PlainPage and DictPage #6088

Motivation

Implementation

3. More Test In SSB

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Use SMID instruction to speed up the page decode of PlainPage and DictPage #6088

Description

Motivation

Implementation

3. More Test In SSB

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions