Skip to content

Conversation

@chunxiaozheng
Copy link
Contributor

What changes were proposed in this pull request?

By reading the code related to updating peers, I found that the currentConf in the ConfigurationManager class maintains the current peers information, and a TreeMap object maintains the historical peers information, the TreeMap object uses the log entry index as the key. Normally, if the index of the newly added Configuration is the largest, the currentConf will be set to the latest added configuration information. Therefore, as long as modify the raft-meta.conf, follow the above rules, set the appropriate index and the latest peering information, we can solve the problem raised in RATIS-1860.
Through analysis, reading the index in old raft-meta.conf and add 1 as the new index is a suitable choice. If we directly use the index in the old raft-meta.conf, an exception will be throw during Preconditions.assertTrue(found.equals(conf)).If we directly set a larger value, such as Long.MAX_Value, an exception will be throw when switch leader.

In this PR, a ratis-shell cmd has been added, which will read raft-meta.conf in the specified path, get the log entry index and construct a new index(old index + 1), then use the new index and the newly passed peers to construct a new raft-meta.conf file to support normal startup when moving log entries and checkpoints to other nodes.
In addition, it is also supported to directly pass an new index, which can be used directly when generating a new raft-meta.conf file. However, this method requires observing the old index. If the input index is inappropriate, it may also cause the new node to not start properly.

What is the link to the Apache JIRA

(Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. RATIS-XXXX. Fix a typo in YYY.)

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)

@chunxiaozheng
Copy link
Contributor Author

@codings-dan Can you help me review this PR? Thanks!

Copy link
Contributor

@codings-dan codings-dan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! Left some comment

List<RaftProtos.RaftPeerProto> raftPeerProtos = new ArrayList<>();
for (String peer : peersStr.split(",")) {
String[] peerInfos = peer.split(":");
String peerId = peerInfos[0] + "_" + peerInfos[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final InetSocketAddress serverAddress = parseInetSocketAddress(address);
final String peerId = RaftUtils.getPeerId(serverAddress);

@Override
public int run(CommandLine cl) throws IOException {
String peersStr = cl.getOptionValue(PEER_OPTION_NAME);
String path = cl.getOptionValue(PATH_OPTION_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peersStr and path need check not null.

import java.util.ArrayList;
import java.util.List;

public class RaftMetaConfCommand extends AbstractRatisCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use GroupCommand as the parent command? BTW, please add some necessary notes.

Copy link
Contributor

@codings-dan codings-dan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Could you add some doc about this command? see https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/cli.md

@chunxiaozheng
Copy link
Contributor Author

chunxiaozheng commented Aug 16, 2023

Thanks for the update! Could you add some doc about this command? see https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/cli.md

@codings-dan Thanks for your review, i have updated the cli.md.

Copy link
Contributor

@codings-dan codings-dan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change looks good, +1

@codings-dan
Copy link
Contributor

@szetszwo Could you help take a look at the pull request? Thanks in advance

@szetszwo
Copy link
Contributor

@codings-dan , sure, will review this.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunxiaozheng , thanks a lot for working on this!

The new command here is very different from the other existing commands since it does not connect the servers. It just reads and writes the raft-meta file.

We have a raft log tool similar to this in the ratis-tools module. How about moving this to ratis-tools?

@chunxiaozheng
Copy link
Contributor Author

@szetszwo Thanks for your review! But I have a question, if we put this command in the ratis-tool module, how should we use it?

@szetszwo
Copy link
Contributor

@chunxiaozheng , do you mean how to run it? We may run it directly with the class (or using maven).

Another option is to add a new parent command, say local, and add the new raftMetaConf under it. Then, it may be easier to run it. Later on, we may include the log dump tool as a subcommand.

 [local [raftMetaConf] [dumpLog]

@chunxiaozheng
Copy link
Contributor Author

@chunxiaozheng , do you mean how to run it? We may run it directly with the class (or using maven).

Another option is to add a new parent command, say local, and add the new raftMetaConf under it. Then, it may be easier to run it. Later on, we may include the log dump tool as a subcommand.

 [local [raftMetaConf] [dumpLog]

@szetszwo Thanks! Adding a local command is a good idea, which makes it very easy to use this tool, and I have already followed this approach to modify PR.

* Command for generate a new raft-meta.conf file based on original raft-meta.conf and new peers,
* which is used to move a raft node to a new node.
*/
public class RaftMetaConfCommand extends AbstractRatisCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunxiaozheng , thanks a lot for the update. If we implement a local command, we should not extend AbstractRatisCommand since it will connect to remote servers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo Thanks for your review! I have updated this PR.
In addition, AbstractParentCommand should not extends AbstractRatisCommand too, since the commands that extends AbstractParentCommand means that it is a parent command, it does not need to connect to remote servers too.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13062287/901_review.patch

Comment on lines 27 to 29
public abstract class AbstractCommand implements Command {

public static InetSocketAddress parseInetSocketAddress(String address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also move the Context and PrintStream related code from AbstractRatisCommand to AbstractCommand.

public abstract class AbstractParentCommand implements Command {
private final Map<String, Command> subs;

public AbstractParentCommand(Context context, List<Function<Context, AbstractRatisCommand>> subCommandConstructors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the Function to Function<Context, Command>. Then, the new LocalCommand can also use it.

String peersStr = cl.getOptionValue(PEER_OPTION_NAME);
String path = cl.getOptionValue(PATH_OPTION_NAME);
if (peersStr == null || path == null || peersStr.isEmpty() || path.isEmpty()) {
System.out.println("peers or path can't be empty.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the PrintStream from Context instead of System.out.

import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.Option;
import org.apache.commons.cli.Options;
import org.apache.ratis.proto.RaftProtos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's import the inner classes. Then, the code is easier to read.

import org.apache.ratis.proto.RaftProtos.LogEntryProto;
import org.apache.ratis.proto.RaftProtos.RaftConfigurationProto;
import org.apache.ratis.proto.RaftProtos.RaftPeerProto;
import org.apache.ratis.proto.RaftProtos.RaftPeerRole;

Comment on lines 74 to 75
try (InputStream in = new FileInputStream(new File(path, RAFT_META_CONF));
OutputStream out = new FileOutputStream(new File(path, NEW_RAFT_META_CONF))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the Files api instead.

@chunxiaozheng
Copy link
Contributor Author

@szetszwo I have updated the PR according to the comments,thanks for your review!

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 change looks good.

@codings-dan codings-dan merged commit 9095c89 into apache:master Aug 21, 2023
@szetszwo
Copy link
Contributor

@chunxiaozheng , thanks again for working on this! Could you post a comment on RATIS-1860 so that we can assign the JIRA to you?

@chunxiaozheng
Copy link
Contributor Author

@chunxiaozheng , thanks again for working on this! Could you post a comment on RATIS-1860 so that we can assign the JIRA to you?

@szetszwo of course, but the page display Public signup for this instance is disabled. I have submitted the request to create jira account and i am waiting for a review.

@szetszwo
Copy link
Contributor

@chunxiaozheng , I see. What username have you submitted?

@chunxiaozheng
Copy link
Contributor Author

@chunxiaozheng , I see. What username have you submitted?

@szetszwo I'm very sorry, I just saw the email, my registered username is chunxiaozheng, and I have seen your reply. Thank you very much!

@szetszwo
Copy link
Contributor

@chunxiaozheng , don't worry. Thank you!

symious pushed a commit to symious/ratis that referenced this pull request Mar 20, 2024
…ache#901)

* Add ratis-shell cmd to generate a new raft-meta.conf.

* Remove index option.

* Style fix.

* Use GroupCommand as the parent of RaftMetaConfCommand.

* Modified by review comments.

* Update cli.md

* Use LocalCommand as the parent of RaftMetaConfCommand.

* ReConstruct some abstract command.

* Modified by review comments.

* Use println instead of printf.

* Checkstyle error fix(delete some unuse import).

---------

Co-authored-by: idellzheng <idellzheng@tencent.com>
RexXiong pushed a commit to apache/celeborn that referenced this pull request May 30, 2024
### What changes were proposed in this pull request?

Bump Ratis version from 2.5.1 to 3.0.1. Address incompatible changes:

- RATIS-589. Eliminate buffer copying in SegmentedRaftLogOutputStream.(apache/ratis#964)
- RATIS-1677. Do not auto format RaftStorage in RECOVER.(apache/ratis#718)
- RATIS-1710. Refactor metrics api and implementation to separated modules. (apache/ratis#749)

### Why are the changes needed?

Bump Ratis version from 2.5.1 to 3.0.1. Ratis has released v3.0.0, v3.0.1, which release note refers to [3.0.0](https://ratis.apache.org/post/3.0.0.html), [3.0.1](https://ratis.apache.org/post/3.0.1.html). The 3.0.x version include new features like pluggable metrics and lease read, etc, some improvements and bugfixes including:

- 3.0.0: Change list of ratis 3.0.0 In total, there are roughly 100 commits diffing from 2.5.1 including:
   - Incompatible Changes
      - RaftStorage Auto-Format
      - RATIS-1677. Do not auto format RaftStorage in RECOVER. (apache/ratis#718)
      - RATIS-1694. Fix the compatibility issue of RATIS-1677. (apache/ratis#731)
      - RATIS-1871. Auto format RaftStorage when there is only one directory configured. (apache/ratis#903)
      - Pluggable Ratis-Metrics (RATIS-1688)
      - RATIS-1689. Remove the use of the thirdparty Gauge. (apache/ratis#728)
      - RATIS-1692. Remove the use of the thirdparty Counter. (apache/ratis#732)
      - RATIS-1693. Remove the use of the thirdparty Timer. (apache/ratis#734)
      - RATIS-1703. Move MetricsReporting and JvmMetrics to impl. (apache/ratis#741)
      - RATIS-1704. Fix SuppressWarnings(“VisibilityModifier”) in RatisMetrics. (apache/ratis#742)
      - RATIS-1710. Refactor metrics api and implementation to separated modules. (apache/ratis#749)
      - RATIS-1712. Add a dropwizard 3 implementation of ratis-metrics-api. (apache/ratis#751)
      - RATIS-1391. Update library dropwizard.metrics version to 4.x (apache/ratis#632)
      - RATIS-1601. Use the shaded dropwizard metrics and remove the dependency (apache/ratis#671)
      - Streaming Protocol Change
      - RATIS-1569. Move the asyncRpcApi.sendForward(..) call to the client side. (apache/ratis#635)
   - New Features
      - Leader Lease (RATIS-1864)
      - RATIS-1865. Add leader lease bound ratio configuration (apache/ratis#897)
      - RATIS-1866. Maintain leader lease after AppendEntries (apache/ratis#898)
      - RATIS-1894. Implement ReadOnly based on leader lease (apache/ratis#925)
      - RATIS-1882. Support read-after-write consistency (apache/ratis#913)
      - StateMachine API
      - RATIS-1874. Add notifyLeaderReady function in IStateMachine (apache/ratis#906)
      - RATIS-1897. Make TransactionContext available in DataApi.write(..). (apache/ratis#930)
      - New Configuration Properties
      - RATIS-1862. Add the parameter whether to take Snapshot when stopping to adapt to different services (apache/ratis#896)
      - RATIS-1930. Add a conf for enable/disable majority-add. (apache/ratis#961)
      - RATIS-1918. Introduces parameters that separately control the shutdown of RaftServerProxy by JVMPauseMonitor. (apache/ratis#950)
      - RATIS-1636. Support re-config ratis properties (apache/ratis#800)
      - RATIS-1860. Add ratis-shell cmd to generate a new raft-meta.conf. (apache/ratis#901)
   - Improvements & Bug Fixes
      - Netty
         - RATIS-1898. Netty should use EpollEventLoopGroup by default (apache/ratis#931)
         - RATIS-1899. Use EpollEventLoopGroup for Netty Proxies (apache/ratis#932)
         - RATIS-1921. Shared worker group in WorkerGroupGetter should be closed. (apache/ratis#955)
         - RATIS-1923. Netty: atomic operations require side-effect-free functions. (apache/ratis#956)
      - RaftServer
         - RATIS-1924. Increase the default of raft.server.log.segment.size.max. (apache/ratis#957)
         - RATIS-1892. Unify the lifetime of the RaftServerProxy thread pool (apache/ratis#923)
         - RATIS-1889. NoSuchMethodError: RaftServerMetricsImpl.addNumPendingRequestsGauge apache/ratis#922 (apache/ratis#922)
         - RATIS-761. Handle writeStateMachineData failure in leader. (apache/ratis#927)
         - RATIS-1902. The snapshot index is set incorrectly in InstallSnapshotReplyProto. (apache/ratis#933)
         - RATIS-1912. Fix infinity election when perform membership change. (apache/ratis#954)
         - RATIS-1858. Follower keeps logging first election timeout. (apache/ratis#894)

- 3.0.1:This is a bugfix release. See the [changes between 3.0.0 and 3.0.1](apache/ratis@ratis-3.0.0...ratis-3.0.1) releases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Cluster manual test.

Closes #2480 from SteNicholas/CELEBORN-1400.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
FMX pushed a commit to apache/celeborn that referenced this pull request Jun 19, 2024
### What changes were proposed in this pull request?

Add `local` command in `celeborn_ratis_shell.md` to sync document [cli.md](https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-docs/src/site/markdown/cli.md).

### Why are the changes needed?

Celeborn has already bumped Ratis version from 2.5.1 to 3.0.1. Ratis v3.0.1 supports `local` command to process local operation, which no need to connect to ratis server. `celeborn_ratis_shell.md` should add local command to guide users to process local operation.

Backport: apache/ratis#901

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2575 from SteNicholas/CELEBORN-1466.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
FMX pushed a commit to apache/celeborn that referenced this pull request Jun 19, 2024
### What changes were proposed in this pull request?

Add `local` command in `celeborn_ratis_shell.md` to sync document [cli.md](https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-docs/src/site/markdown/cli.md).

### Why are the changes needed?

Celeborn has already bumped Ratis version from 2.5.1 to 3.0.1. Ratis v3.0.1 supports `local` command to process local operation, which no need to connect to ratis server. `celeborn_ratis_shell.md` should add local command to guide users to process local operation.

Backport: apache/ratis#901

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.

Closes #2575 from SteNicholas/CELEBORN-1466.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
(cherry picked from commit e4d3b80)
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants