Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

Proposed changes

Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine if dictionary page exists where as the C readers uses "has_dictionaryPageOffset" (_isset bit in thrift message) to determine the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both __isset.dictionary_page_offset is true and dictionary_page_offset is greater than 0.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.32% (9630/25805)
Line Coverage: 28.69% (79853/278283)
Region Coverage: 28.14% (41294/146757)
Branch Coverage: 24.75% (21036/84980)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6797a8f28b8ba8c9141f2672df3d9c0382ea825a_6797a8f28b8ba8c9141f2672df3d9c0382ea825a/report/index.html

@suxiaogang223
Copy link
Contributor Author

@CodiumAI-Agent /review

@suxiaogang223 suxiaogang223 force-pushed the dict_page_offset_zero branch from 6797a8f to 00bbb4e Compare October 6, 2024 17:04
@suxiaogang223
Copy link
Contributor Author

run buildall

@QodoAI-Agent
Copy link

QodoAI-Agent commented Oct 6, 2024

PR Reviewer Guide 🔍

(Review updated until commit 215f451)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logic Change
The logic for determining the start offset has been modified to include a check for dictionary_page_offset > 0. Ensure this change aligns with expected behavior across all scenarios.

Logic Change
Similar to vparquet_column_chunk_reader.cpp, the logic for determining chunk_start has been updated. This needs thorough testing to confirm no regression in other use cases.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.32% (9630/25805)
Line Coverage: 28.71% (79891/278284)
Region Coverage: 28.14% (41295/146763)
Branch Coverage: 24.76% (21043/84984)
Coverage Report: http://coverage.selectdb-in.cc/coverage/215f451c9a3cf5c1d2c3c900a5f19f9d000b83ff_215f451c9a3cf5c1d2c3c900a5f19f9d000b83ff/report/index.html

@QodoAI-Agent
Copy link

Persistent review updated to latest commit 215f451

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.28% (9631/25833)
Line Coverage: 28.66% (79847/278579)
Region Coverage: 28.10% (41294/146932)
Branch Coverage: 24.73% (21040/85090)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a1b18b1995510b594024c13f1c55ec2805aeb499_a1b18b1995510b594024c13f1c55ec2805aeb499/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 8, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2024

PR approved by anyone and no changes requested.

@morningman morningman merged commit 0d2550f into apache:master Oct 10, 2024
@suxiaogang223 suxiaogang223 deleted the dict_page_offset_zero branch October 10, 2024 08:02
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Oct 10, 2024
…he#41506)

## Proposed changes
Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine
if dictionary page exists where as the C readers uses
"has_dictionaryPageOffset" (_isset bit in thrift message) to determine
the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both
`__isset.dictionary_page_offset` is true and `dictionary_page_offset` is
greater than 0.
cjj2010 pushed a commit to cjj2010/doris that referenced this pull request Oct 12, 2024
…he#41506)

## Proposed changes
Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine
if dictionary page exists where as the C readers uses
"has_dictionaryPageOffset" (_isset bit in thrift message) to determine
the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both
`__isset.dictionary_page_offset` is true and `dictionary_page_offset` is
greater than 0.
amorynan pushed a commit to amorynan/doris that referenced this pull request Oct 12, 2024
…he#41506)

## Proposed changes
Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine
if dictionary page exists where as the C readers uses
"has_dictionaryPageOffset" (_isset bit in thrift message) to determine
the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both
`__isset.dictionary_page_offset` is true and `dictionary_page_offset` is
greater than 0.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 16, 2024
…he#41506)

## Proposed changes
Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine
if dictionary page exists where as the C readers uses
"has_dictionaryPageOffset" (_isset bit in thrift message) to determine
the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both
`__isset.dictionary_page_offset` is true and `dictionary_page_offset` is
greater than 0.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Oct 16, 2024
…he#41506)

## Proposed changes
Reason: https://issues.apache.org/jira/browse/ARROW-5322
Java readers(parquet-mr) handles "dictionaryPageOffset = 0" to determine
if dictionary page exists where as the C readers uses
"has_dictionaryPageOffset" (_isset bit in thrift message) to determine
the same resulting in incompatible behaviours.
Therefore, we should consider that dicttionary page exists when both
`__isset.dictionary_page_offset` is true and `dictionary_page_offset` is
greater than 0.
yiguolei pushed a commit that referenced this pull request Oct 17, 2024
…bug (#41931)

## Proposed changes
pick pr:
  #41683
  #41506
  #41338
  #39326

---------

Co-authored-by: morningman <morningman@163.com>
morningman added a commit that referenced this pull request Oct 17, 2024
) (#41923)

## Proposed changes
pick prs:
#41506
#41526
#41683
#41816

---------

Co-authored-by: morningman <morningman@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants