Skip to content

Conversation

@ndimiduk
Copy link
Member

Introduces the option for an HStore to fully read the file it just wrote after a flush or compaction.

To enable this feature, set hbase.hstore.validate.read_fully=true. This is an HStore configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the ColumnFamilyDescriptor.

Copy link
Member Author

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sketched out a basic implementation.

Still working out how to add a test for this -- seems like I'll need a unit-test context DataBlockEncoder, which our Enum system for codec registration does not support.

It's probably also interesting to add a metric here.

}

/**
* We are trying to remove / relax the region read lock for compaction. Let's see what are the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was helpful for understanding the compaction logic. I cleaned up the formatting to how it was before spotless, using proper javadoc syntax.

* <p>
* Compaction event should be idempotent, since there is no IO Fencing for the region directory in
* hdfs. A region server might still try to complete the compaction after it lost the region. That
* is why the following events are carefully ordered for a compaction: 1. Compaction writes new
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this comment.

this.openStoreFileThreadPoolCreator = store.getHRegion()::getStoreFileOpenAndCloseThreadPool;
this.storeFileTracker = createStoreFileTracker(conf, store);
assert compactor != null && compactionPolicy != null && storeFileManager != null
&& storeFlusher != null && storeFileTracker != null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static analysis says storeFileTracker is always null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the nit that IntelliJ provided, yes. Reading down through createStoreFileTracker(), it never returns null. If it fails, it throws.

HStoreFile storeFile = null;
try {
storeFile = createStoreFileAndReader(path);
if (conf.getBoolean(READ_FULLY_ON_VALIDATE_KEY, DEFAULT_READ_FULLY_ON_VALIDATE)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the new feature. Anywhere that we previously would invoke validateStoreFile() will now conditionally include the more rigorous check. There are a couple places where we do NOT call validateStoreFile after writing it, I'm not yet sure why.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@ndimiduk ndimiduk marked this pull request as ready for review February 25, 2025 11:36
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@petersomogyi petersomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this safeguard!

this.openStoreFileThreadPoolCreator = store.getHRegion()::getStoreFileOpenAndCloseThreadPool;
this.storeFileTracker = createStoreFileTracker(conf, store);
assert compactor != null && compactionPolicy != null && storeFileManager != null
&& storeFlusher != null && storeFileTracker != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never null?

@ndimiduk
Copy link
Member Author

I ran a naive test of the performance impact of this feature and shared some results over on the JIRA.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

…iles

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 30s master passed
+1 💚 compile 3m 12s master passed
+1 💚 checkstyle 0m 37s master passed
+1 💚 spotbugs 1m 38s master passed
+1 💚 spotless 0m 47s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 9s the patch passed
+1 💚 compile 3m 8s the patch passed
+1 💚 javac 3m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 38s the patch passed
+1 💚 spotbugs 1m 41s the patch passed
+1 💚 hadoopcheck 12m 4s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
39m 18s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6700/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6700
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 45b24878fd50 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 0cfa9b2
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6700/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 0s master passed
+1 💚 compile 0m 57s master passed
+1 💚 javadoc 0m 27s master passed
+1 💚 shadedjars 5m 54s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 javadoc 0m 26s the patch passed
+1 💚 shadedjars 5m 54s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 215m 3s hbase-server in the patch passed.
240m 5s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6700/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6700
Optional Tests javac javadoc unit compile shadedjars
uname Linux 80708a72c939 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 0cfa9b2
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6700/5/testReport/
Max. process+thread count 5449 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6700/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit 47d2aa5 into apache:master Mar 17, 2025
1 check passed
@ndimiduk ndimiduk deleted the 29131-master branch March 17, 2025 13:33
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Mar 17, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Mar 17, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Mar 17, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit to ndimiduk/hbase that referenced this pull request Mar 17, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit that referenced this pull request Mar 18, 2025
…iles (#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit that referenced this pull request Mar 18, 2025
…iles (#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit that referenced this pull request Mar 18, 2025
…iles (#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit that referenced this pull request Mar 18, 2025
…iles (#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
ndimiduk added a commit to HubSpot/hbase that referenced this pull request Mar 18, 2025
…n validation of HFiles (apache#6700) (will be in 2.6.3)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Mar 18, 2025
…n validation of HFiles (apache#6700) (will be in 2.6.3)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
charlesconnell pushed a commit to HubSpot/hbase that referenced this pull request Jul 1, 2025
…n validation of HFiles (apache#6700) (will be in 2.6.3)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
sanjeet006py pushed a commit to sanjeet006py/hbase that referenced this pull request Sep 26, 2025
…iles (apache#6700)

Introduces the option for an HStore to fully read the file it just wrote after a flush or
compaction.

To enable this feature, set `hbase.hstore.validate.read_fully=true`. This is an HStore
configuration feature, so it can be enabled in hbase-site.xml, in the TableDescriptor, or in the
ColumnFamilyDescriptor.

Signed-off-by: Peter Somogyi <psomogyi@apache.org >
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants