Skip to content

Conversation

@CanGuan
Copy link
Contributor

@CanGuan CanGuan commented Oct 23, 2023

Add timeout config for sync

Proposed changes

Issue Number: close #25764

Add sync_new_image_timeout config, for adjusting synchronize image timeout from master

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@CanGuan CanGuan changed the title [fix](meta) add sync new image timeout (#25764) [fix](meta) add sync new image timeout Oct 23, 2023
@morningman
Copy link
Contributor

Have you try it? can it really solve the timeout issue?

"adjust by the size of image file in the ${meta_dir}/image and the network environment between nodes. " +
"The default values is 3000 ms."
})
public static int sync_new_image_timeout = 30000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static int sync_new_image_timeout = 30000;
public static int sync_new_image_timeout_s = 300;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

String filename = Storage.IMAGE + "." + version;
File dir = new File(this.imageDir);
MetaHelper.getRemoteFile(url, HTTP_TIMEOUT_SECOND * 1000, MetaHelper.getFile(filename, dir));
MetaHelper.getRemoteFile(url, Config.sync_new_image_timeout, MetaHelper.getFile(filename, dir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MetaHelper.getRemoteFile(url, Config.sync_new_image_timeout, MetaHelper.getFile(filename, dir));
MetaHelper.getRemoteFile(url, Config.sync_new_image_timeout_s * 1000, MetaHelper.getFile(filename, dir));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Add timeout config for sync
@CanGuan
Copy link
Contributor Author

CanGuan commented Oct 23, 2023

Have you try it? can it really solve the timeout issue?

We found this bug by adding new follower to our doris cluster. By tracing this bug, we found the sync image timeout is only 5s and no way to modify it. After fixing and deploying to our cluster, it fix the problem. Here are the success log

image
In our case (image file is 1G), Follower takes 22s to synchronize the file from FE master.

refer to issue #25764

guanzhaoyun added 2 commits October 23, 2023 19:34
+ "adjust by the size of image file in the ${meta_dir}/image and the network environment between nodes. "
+ "The default values is 300s."
})
public static int sync_new_image_timeout_s = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_image_timeout_second is a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

morningman
morningman previously approved these changes Oct 24, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman
Copy link
Contributor

run buildall

@CanGuan
Copy link
Contributor Author

CanGuan commented Oct 24, 2023

LGTM

Please review again, chore code for checkstyle

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman
Copy link
Contributor

run buildall

@morningman morningman self-assigned this Oct 25, 2023
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 25, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman added the usercase Important user case type label label Oct 25, 2023
@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.56 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162517629 Bytes

@morningman morningman merged commit 0fb57a6 into apache:master Oct 25, 2023
yiguolei pushed a commit that referenced this pull request Oct 27, 2023
The image file of our cluster reaches 2.3G. After the checkpoint, Followers synchronize the image timeout, resulting in the continuous increase of the bdb directory.

related pr: #25768
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
…#26003)

The image file of our cluster reaches 2.3G. After the checkpoint, Followers synchronize the image timeout, resulting in the continuous increase of the bdb directory.

related pr: apache#25768
xiaokang pushed a commit to xiaokang/doris that referenced this pull request Oct 31, 2023
xiaokang pushed a commit to xiaokang/doris that referenced this pull request Nov 1, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…#26003)

The image file of our cluster reaches 2.3G. After the checkpoint, Followers synchronize the image timeout, resulting in the continuous increase of the bdb directory.

related pr: apache#25768
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] sync image file timeout when adding new follower node

5 participants