Skip to content

Migrate current integration batch tests to equivalent MSQ tests#13374

Merged
cryptoe merged 10 commits intoapache:masterfrom
abhagraw:master
Nov 21, 2022
Merged

Migrate current integration batch tests to equivalent MSQ tests#13374
cryptoe merged 10 commits intoapache:masterfrom
abhagraw:master

Conversation

@abhagraw
Copy link
Copy Markdown
Contributor

@abhagraw abhagraw commented Nov 16, 2022

Description

Migrate current integration batch tests to equivalent MSQ tests in the new IT Framework. A new MSQ IT test was added which will pick up any json files present in integration-tests-ex/cases/src/test/resources/multi-stage-query/batch-index.

Also increased the verbosity of the tests running in the new IT framework.

@kfaraz kfaraz added Area - Testing Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Nov 16, 2022
Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhagraw Thanks for the PR. Left an initial review.

Comment thread distribution/pom.xml
<argument>-c</argument>
<argument>org.apache.druid.extensions:druid-multi-stage-query</argument>
<argument>-c</argument>
<argument>org.apache.druid.extensions:druid-catalog</argument>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this PR? If not, then should this be raised as a separate PR?

Copy link
Copy Markdown
Contributor Author

@abhagraw abhagraw Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker containers created for the test fail with the following error, if this extension is not added -

2022-11-15T06:35:58,107 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.2.5.Final
Exception in thread "main" org.apache.druid.java.util.common.ISE: Extension [/usr/local/apache-druid-2022.12.0-iap-SNAPSHOT/extensions/druid-catalog] specified in "druid.extensions.loadList" didn't exist!?
	at org.apache.druid.guice.ExtensionsLoader.getExtensionFilesToLoad(ExtensionsLoader.java:183)
	at org.apache.druid.guice.ExtensionsLoader$ServiceLoadingFromExtensions.addAllFromFileSystem(ExtensionsLoader.java:292)
	at org.apache.druid.guice.ExtensionsLoader$ServiceLoadingFromExtensions.<init>(ExtensionsLoader.java:280)
	at org.apache.druid.guice.ExtensionsLoader$ServiceLoadingFromExtensions.<init>(ExtensionsLoader.java:268)
	at org.apache.druid.guice.ExtensionsLoader.lambda$getFromExtensions$0(ExtensionsLoader.java:139)
	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
	at org.apache.druid.guice.ExtensionsLoader.getFromExtensions(ExtensionsLoader.java:137)
	at org.apache.druid.cli.Main.main(Main.java:98)```

Comment thread integration-tests-ex/cases/pom.xml
Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.
LGTM post they are addressed and CICD is 🟢

@cryptoe
Copy link
Copy Markdown
Contributor

cryptoe commented Nov 19, 2022

Another nit we should fix in this PR :

2022-11-18T19:26:19,074 INFO [main] org.apache.druid.testing.utils.TestQueryHelper - Sql Task submitted with task Id - query-77b14a7e-999d-4756-92eb-9fba59f68836

2022-11-18T19:26:19,105 WARN [main] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 99) in 895ms.

org.apache.druid.testing.utils.MsqTestQueryHelper$TaskStillRunningException: null

	at org.apache.druid.testing.utils.MsqTestQueryHelper.lambda$pollTaskIdForCompletion$0(MsqTestQueryHelper.java:163) ~[druid-integration-tests-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]
	
public TaskState pollTaskIdForCompletion(String taskId) throws Exception
  {
    return RetryUtils.retry(
        () -> {
          TaskStatusPlus taskStatusPlus = overlordClient.getTaskStatus(taskId);
          TaskState statusCode = taskStatusPlus.getStatusCode();
          if (statusCode != null && statusCode.isComplete()) {
            return taskStatusPlus.getStatusCode();
          }
          throw new TaskStillRunningException();
        },
        (Throwable t) -> t instanceof TaskStillRunningException,
        100
    );
  }

Let's change this method to not throw an exception but just log unable to get the status of the task. We can do this as part of another PR as well.

@abhagraw
Copy link
Copy Markdown
Contributor Author

Another nit we should fix in this PR :

2022-11-18T19:26:19,074 INFO [main] org.apache.druid.testing.utils.TestQueryHelper - Sql Task submitted with task Id - query-77b14a7e-999d-4756-92eb-9fba59f68836

2022-11-18T19:26:19,105 WARN [main] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 99) in 895ms.

org.apache.druid.testing.utils.MsqTestQueryHelper$TaskStillRunningException: null

	at org.apache.druid.testing.utils.MsqTestQueryHelper.lambda$pollTaskIdForCompletion$0(MsqTestQueryHelper.java:163) ~[druid-integration-tests-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]

	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-core-25.0.0-SNAPSHOT.jar:25.0.0-SNAPSHOT]
	
public TaskState pollTaskIdForCompletion(String taskId) throws Exception
  {
    return RetryUtils.retry(
        () -> {
          TaskStatusPlus taskStatusPlus = overlordClient.getTaskStatus(taskId);
          TaskState statusCode = taskStatusPlus.getStatusCode();
          if (statusCode != null && statusCode.isComplete()) {
            return taskStatusPlus.getStatusCode();
          }
          throw new TaskStillRunningException();
        },
        (Throwable t) -> t instanceof TaskStillRunningException,
        100
    );
  }

Let's change this method to not throw an exception but just log unable to get the status of the task. We can do this as part of another PR as well.

Used quite retry - so that not all retries throw the exception + stacktrace.

Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks for the contribution @abhagraw

@cryptoe cryptoe merged commit 5172d76 into apache:master Nov 21, 2022
@kfaraz kfaraz modified the milestones: 24.0.1, 25.0 Nov 21, 2022
cryptoe pushed a commit to cryptoe/druid that referenced this pull request Nov 28, 2022
…he#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments
cryptoe added a commit that referenced this pull request Jan 11, 2023
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants