Skip to content

Conversation

@SzyWilliam
Copy link
Member

@SzyWilliam SzyWilliam changed the title Implement ReadOnly based on leader lease RATIS-1894. Implement ReadOnly based on leader lease Sep 24, 2023
@SzyWilliam SzyWilliam requested a review from szetszwo September 25, 2023 02:23
Copy link
Contributor

@OneSizeFitsQuorum OneSizeFitsQuorum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great job is coming to an end!

I have left some minor issue~ PTAL

* @return current readIndex.
*/
CompletableFuture<Long> getReadIndex(Long readAfterWriteConsistentIndex) {
CompletableFuture<Long> getReadIndex(RaftServerConfigKeys.Read.Option option, Long readAfterWriteConsistentIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that with lease read we need to change the name of this function (and maybe some other functions on the call stack). The important logic for both readIndex and leaseRead is three steps.

  1. Verify that the current node is the Leader (readIndex uses a logical clock to initiate heartbeat, lease read uses a physical clock with an error upper bound),
  2. Wait for the local applyIndex to reach the current commitIndex
  3. Query the state machine.

Maybe we could name the first step confirmLeader or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both need to confirm leader and obtain readIndex first. ReadOnly(ReadIndex based) uses majority heartbeats to obtain the readIndex while ReadOnly(Lease based) uses leader lease. I think readIndex is a shared step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! This really depends on how to understand the meaning of the word readIndex. If we consider that the follower also needs to get commitIndex from the leader when using lease read, it is reasonable for me to support current naming logic, so please ignore this review~

LINEARIZABLE,

/** Use Leader lease. Efficient than ReadIndex if clock drift is bounded */
LEASE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that both DEFAULT and LINEARIZABLE are talking about consistency issues. Lease and ReadIndex are implementation techniques for implementing LINEARIZABLE, so it seems inappropriate to mention lease alone alongside the other two options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. DEFAULT / LINEARIZABLE / LEASE sound vague to the terminal users. We should consider renaming these options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to use LINEARIZABLE not LEASE?

  • If yes, we could have a separated configuration to enable/disable LeaderLease.
  • If not, we could just have LINEARIZABLE for ReadIndex + LeaderLease.

+1 Patch looks good other that this. Thanks @SzyWilliam for working on this!

LeaderLease(RaftProperties properties) {
this.enabled = RaftServerConfigKeys.Read.leaderLeaseEnabled(properties);
final double leaseRatio = RaftServerConfigKeys.Read.leaderLeaseTimeoutRatio(properties);
Preconditions.assertTrue(leaseRatio > 0.0 && leaseRatio <= 1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check this when lease read is disabled?


boolean isEnabled() {
return enabled;
boolean getAndSetEnabled(boolean enabled) {
Copy link
Contributor

@OneSizeFitsQuorum OneSizeFitsQuorum Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you set this field to true?

It looks like now it won't be true as long as it's set to false once

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be set back to original value if we failed transferring leadership. In other situations (stepDown / successful transferring), the LeaderStateImpl will be stopped and the enabled will stay in false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the new leader will create a new lease according to the configuration parameters~

@SzyWilliam
Copy link
Member Author

Is there a good reason to use LINEARIZABLE not LEASE?

I believe we have.

LINEARIZABLE should meet most requirements in typical scenarios. It allows followers to handle read requests while maintaining strict linearizability. This will be the default and recommended option to use in IoTDB.

LEASE is designed for situations where low latency is crucial, even at the potential cost of linearizability violations. The use of LEASE is mandated by Alluxio @codings-dan. His experiments have demonstrated that in a 3-node cluster, using LINEARIZABLE alone results in only a 160% improvement in throughput due to latency, whereas theoretically, it could achieve a 300% improvement.

Nevertheless, IoTDB may consider employing this option if the potential benefits outweigh the drawbacks.

Also, I added revoke mechanisms during leader-transfer / stepDown. PTAL, thanks!
@szetszwo @OneSizeFitsQuorum

}

// if lease is enabled, check lease first
if (hasLease()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that even though lease read is not enabled, we are still calling hasLease frequently and attempting to extend lease many times. Maybe we could use some optimization here, since it's a critical step in the query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! Fixed, PTAL.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

Copy link
Contributor

@OneSizeFitsQuorum OneSizeFitsQuorum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~
Thanks a lot for @SzyWilliam! Lease read is expected to be available to more users in ratis 3.0!

@SzyWilliam SzyWilliam merged commit 0711c39 into apache:feature/leaderlease Sep 28, 2023
RexXiong pushed a commit to apache/celeborn that referenced this pull request May 30, 2024
### What changes were proposed in this pull request?

Bump Ratis version from 2.5.1 to 3.0.1. Address incompatible changes:

- RATIS-589. Eliminate buffer copying in SegmentedRaftLogOutputStream.(apache/ratis#964)
- RATIS-1677. Do not auto format RaftStorage in RECOVER.(apache/ratis#718)
- RATIS-1710. Refactor metrics api and implementation to separated modules. (apache/ratis#749)

### Why are the changes needed?

Bump Ratis version from 2.5.1 to 3.0.1. Ratis has released v3.0.0, v3.0.1, which release note refers to [3.0.0](https://ratis.apache.org/post/3.0.0.html), [3.0.1](https://ratis.apache.org/post/3.0.1.html). The 3.0.x version include new features like pluggable metrics and lease read, etc, some improvements and bugfixes including:

- 3.0.0: Change list of ratis 3.0.0 In total, there are roughly 100 commits diffing from 2.5.1 including:
   - Incompatible Changes
      - RaftStorage Auto-Format
      - RATIS-1677. Do not auto format RaftStorage in RECOVER. (apache/ratis#718)
      - RATIS-1694. Fix the compatibility issue of RATIS-1677. (apache/ratis#731)
      - RATIS-1871. Auto format RaftStorage when there is only one directory configured. (apache/ratis#903)
      - Pluggable Ratis-Metrics (RATIS-1688)
      - RATIS-1689. Remove the use of the thirdparty Gauge. (apache/ratis#728)
      - RATIS-1692. Remove the use of the thirdparty Counter. (apache/ratis#732)
      - RATIS-1693. Remove the use of the thirdparty Timer. (apache/ratis#734)
      - RATIS-1703. Move MetricsReporting and JvmMetrics to impl. (apache/ratis#741)
      - RATIS-1704. Fix SuppressWarnings(“VisibilityModifier”) in RatisMetrics. (apache/ratis#742)
      - RATIS-1710. Refactor metrics api and implementation to separated modules. (apache/ratis#749)
      - RATIS-1712. Add a dropwizard 3 implementation of ratis-metrics-api. (apache/ratis#751)
      - RATIS-1391. Update library dropwizard.metrics version to 4.x (apache/ratis#632)
      - RATIS-1601. Use the shaded dropwizard metrics and remove the dependency (apache/ratis#671)
      - Streaming Protocol Change
      - RATIS-1569. Move the asyncRpcApi.sendForward(..) call to the client side. (apache/ratis#635)
   - New Features
      - Leader Lease (RATIS-1864)
      - RATIS-1865. Add leader lease bound ratio configuration (apache/ratis#897)
      - RATIS-1866. Maintain leader lease after AppendEntries (apache/ratis#898)
      - RATIS-1894. Implement ReadOnly based on leader lease (apache/ratis#925)
      - RATIS-1882. Support read-after-write consistency (apache/ratis#913)
      - StateMachine API
      - RATIS-1874. Add notifyLeaderReady function in IStateMachine (apache/ratis#906)
      - RATIS-1897. Make TransactionContext available in DataApi.write(..). (apache/ratis#930)
      - New Configuration Properties
      - RATIS-1862. Add the parameter whether to take Snapshot when stopping to adapt to different services (apache/ratis#896)
      - RATIS-1930. Add a conf for enable/disable majority-add. (apache/ratis#961)
      - RATIS-1918. Introduces parameters that separately control the shutdown of RaftServerProxy by JVMPauseMonitor. (apache/ratis#950)
      - RATIS-1636. Support re-config ratis properties (apache/ratis#800)
      - RATIS-1860. Add ratis-shell cmd to generate a new raft-meta.conf. (apache/ratis#901)
   - Improvements & Bug Fixes
      - Netty
         - RATIS-1898. Netty should use EpollEventLoopGroup by default (apache/ratis#931)
         - RATIS-1899. Use EpollEventLoopGroup for Netty Proxies (apache/ratis#932)
         - RATIS-1921. Shared worker group in WorkerGroupGetter should be closed. (apache/ratis#955)
         - RATIS-1923. Netty: atomic operations require side-effect-free functions. (apache/ratis#956)
      - RaftServer
         - RATIS-1924. Increase the default of raft.server.log.segment.size.max. (apache/ratis#957)
         - RATIS-1892. Unify the lifetime of the RaftServerProxy thread pool (apache/ratis#923)
         - RATIS-1889. NoSuchMethodError: RaftServerMetricsImpl.addNumPendingRequestsGauge apache/ratis#922 (apache/ratis#922)
         - RATIS-761. Handle writeStateMachineData failure in leader. (apache/ratis#927)
         - RATIS-1902. The snapshot index is set incorrectly in InstallSnapshotReplyProto. (apache/ratis#933)
         - RATIS-1912. Fix infinity election when perform membership change. (apache/ratis#954)
         - RATIS-1858. Follower keeps logging first election timeout. (apache/ratis#894)

- 3.0.1:This is a bugfix release. See the [changes between 3.0.0 and 3.0.1](apache/ratis@ratis-3.0.0...ratis-3.0.1) releases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Cluster manual test.

Closes #2480 from SteNicholas/CELEBORN-1400.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants