Skip to content

Conversation

@morningman
Copy link
Contributor

When query result reach limit, the Coordinator in FE will send a cancel
request to BE to cancel the query. And when being cancelled, BE will report
query status to FE for debug purpose. But actually it is not necessary
and will generate too many logs.

So I add a CancelReason to distinguish the difference between 'normally'
cancellation and 'internal error' cancellation. if 'normally' cancelled,
no status will be reported to FE.

When query reach limit, or user cancel it actively, it is being cancelled 'normally'.
Otherwise, the query is cancelled due to internal error, which will need
a report from BE.

When query result reach limit, the Coordinator in FE will send a cancel
request to BE to cancel the query. And when being cancelled, BE will report
query status to FE for debug purpose. But actually it is not necessary
and will generate too many logs.

So I add a CancelReason to distinguish the difference between 'normally'
cancellation and 'internal error' cancellation. if 'normally' cancelled,
no status will be reported from BE.

When query reach limit, or user cancel it actively, it is being cancelled 'normally'.
Otherwise, the query is cancelled due to internal error, which will need
a report from BE.
message PCancelPlanFragmentRequest {
required PUniqueId finst_id = 1;
enum CancelReason {
LIMIT_REACH = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve 0, start from 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


message PCancelPlanFragmentRequest {
required PUniqueId finst_id = 1;
enum CancelReason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to move CancelReason out of PCancelPlanFragmentRequest. Because fragment can be cancelled in other format. For example when Fragment is timed out, cancel thread will cancel this instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// The reason of cancellation of this query.
// This is used for telling BE whether it need to report query status when being cancelled.
public enum CancelReason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you have defined in protobuf, we should use it to avoid define one thing in different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Status FragmentExecState::cancel(const PCancelPlanFragmentRequest::CancelReason& reason) {
std::lock_guard<std::mutex> l(_status_lock);
RETURN_IF_ERROR(_exec_status);
if (reason != PCancelPlanFragmentRequest::INTERNAL_ERROR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use equal other than not equal here. Because when we add more types of error later, we should let it report in default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


public Future<PCancelPlanFragmentResult> cancelPlanFragmentAsync(
TNetworkAddress address, TUniqueId finstId) throws RpcException {
TNetworkAddress address, TUniqueId finstId, CancelReason cancelReason) throws RpcException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cancelReason is not used?

for (BackendExecState backendExecState : backendExecStates) {
TNetworkAddress address = backendExecState.getBackendAddress();
LOG.info("cancelRemoteFragments initiated={} done={} hasCanceled={} ip={} port={} fragment instance id={}",
LOG.info("cancelRemoteFragments initiated={} done={} hasCanceled={} ip={} port={} fragment instance id={}, reason: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this log can be debug level

required PStatus status = 1;
};

enum PCancelReason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFragmentCancelReason?

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman closed this Jul 19, 2019
@morningman morningman reopened this Jul 19, 2019
@morningman morningman merged commit 556299a into apache:master Jul 19, 2019
morningman added a commit to morningman/doris that referenced this pull request Jul 22, 2019
…pache#1489)

When query result reach limit, the Coordinator in FE will send a cancel
request to BE to cancel the query. And when being cancelled, BE will report
query status to FE for debug purpose. But actually it is not necessary
and will generate too many logs.

So I add a CancelReason to distinguish the difference between 'normally'
cancellation and 'internal error' cancellation. if 'normally' cancelled,
no status will be reported from BE.

When query reach limit, or user cancel it actively, it is being cancelled 'normally'.
Otherwise, the query is cancelled due to internal error, which will need
a report from BE.
@imay imay mentioned this pull request Sep 26, 2019
luwei16 added a commit to luwei16/Doris that referenced this pull request Apr 7, 2023
…ster (apache#1489)

When there are pure compute plan nodes, such as join nodes, in a plan, the original impl. may assign a BE to the node which does not belong to the current cloud cluster. It leads to incomplete isolation of clusters.
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
…ster (apache#1489)

When there are pure compute plan nodes, such as join nodes, in a plan, the original impl. may assign a BE to the node which does not belong to the current cloud cluster. It leads to incomplete isolation of clusters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants