Skip to content

HDDS-8925. BaseFreonGenerator may not complete if last attempts fail#4975

Merged
adoroszlai merged 2 commits intoapache:masterfrom
adoroszlai:HDDS-8925
Jun 27, 2023
Merged

HDDS-8925. BaseFreonGenerator may not complete if last attempts fail#4975
adoroszlai merged 2 commits intoapache:masterfrom
adoroszlai:HDDS-8925

Conversation

@adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Jun 25, 2023

What changes were proposed in this pull request?

Fix edge case in BaseFreonGenerator which happens if there are tasks in progress when ProgressBar shutdown/termination is initiated AND one or more of these in-progress tasks fail.

Repro:

$ docker-compose exec -T s3g ozone freon s3kg -n 1 -t 10 -b no-such-bucket
...
2023-06-25 13:30:46,493 [main] INFO freon.BaseFreonGenerator: Executing test with prefix p0l2uvfgg7 and number-of-tests 1
2023-06-25 13:30:46,507 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:46,792 [pool-2-thread-1] ERROR freon.BaseFreonGenerator: Error on executing task 0
com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
	at com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:50)
	...
	at org.apache.hadoop.ozone.freon.S3KeyGenerator.createKey(S3KeyGenerator.java:111)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:220)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:200)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:174)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-25 13:30:47,509 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:48,510 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:49,511 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:50,513 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:51,514 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:52,515 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:53,517 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:54,518 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:55,519 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 13:30:56,520 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
...

https://issues.apache.org/jira/browse/HDDS-8925

How was this patch tested?

Added integration test.

Also tested manually:

$ docker-compose exec -T s3g ozone freon s3kg -n 1 -t 10 -b no-such-bucket
...
2023-06-25 16:08:04,029 [main] INFO freon.BaseFreonGenerator: Executing test with prefix x423upnr6h and number-of-tests 1
2023-06-25 16:08:04,044 [Thread-3] INFO freon.ProgressBar: Progress: 0.00 % (0 out of 1)
2023-06-25 16:08:04,338 [pool-2-thread-1] ERROR freon.BaseFreonGenerator: Error on executing task 0
com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
	at com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:50)
	...
	at org.apache.hadoop.ozone.freon.S3KeyGenerator.lambda$createKey$0(S3KeyGenerator.java:145)
	at com.codahale.metrics.Timer.time(Timer.java:101)
	at org.apache.hadoop.ozone.freon.S3KeyGenerator.createKey(S3KeyGenerator.java:111)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:220)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:200)
	at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:174)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-25 16:08:05,045 [Thread-3] INFO freon.ProgressBar: Progress: 100.00 % (1 out of 1)
One ore more freon test is failed.
2023-06-25 16:08:05,062 [shutdown-hook-0] INFO metrics: type=TIMER, name=key-create, count=1, min=13.293068, max=13.293068, mean=13.293068, stddev=0.0, median=13.293068, p75=13.293068, p95=13.293068, p98=13.293068, p99=13.293068, p999=13.293068, mean_rate=1.3469914857611078, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds
2023-06-25 16:08:05,063 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Total execution time (sec): 1
2023-06-25 16:08:05,063 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Failures: 1
2023-06-25 16:08:05,063 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Successful executions: 0
2023-06-25 16:08:05,063 [shutdown-hook-0] INFO freon.BaseFreonGenerator: Expected 1 --number-of-tests objects!, successfully executed 0

CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5370663046

@adoroszlai adoroszlai self-assigned this Jun 25, 2023
Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding @adoroszlai, thank you for the fix, I played around with it a bit locally, to understand better the mechanics, with that I think the patch is good.

+1

@adoroszlai adoroszlai merged commit ecc7d5f into apache:master Jun 27, 2023
@adoroszlai adoroszlai deleted the HDDS-8925 branch June 27, 2023 05:07
@adoroszlai
Copy link
Contributor Author

Thanks @fapifta for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 30, 2023
* master:
  HDDS-8555. [Snapshot] When snapshot feature is disabled, block OM startup if there are still snapshots in the system (apache#4994)
  HDDS-8782. Improve Volume Scanner Health checks. (apache#4867)
  HDDS-8447. Datanodes should not process container deletes for failed volumes. (apache#4901)
  HDDS-5869. Added support for stream on S3Gateway write path (apache#4970)
  HDDS-8859. [Snapshot] Return failure message to client for a failed snapshot diff jobs (apache#4993)
  HDDS-8939. [Snapshot] isBlockLocationSame check should be skipped if object is not OmKeyInfo. (apache#4991)
  HDDS-8923. Expose XceiverClient cache stats as metrics (apache#4979)
  HDDS-8913. ContainerManagerImpl: reduce processing while locked (apache#4967)
  HDDS-8935. [Snapshot] Fallback to full diff if getDetlaFiles from compaction DAG fails (apache#4986)
  HDDS-8911. Update Hadoop to 3.3.6 (apache#4985)
  HDDS-8931. Allow EC PipelineChoosingPolicy to be defined separately from Ratis (apache#4983)
  HDDS-8895. Support dynamic change of ozone.readonly.administrators in SCM (apache#4977)
  HDDS-6814. Make OM service ID optional for `ozone s3` commands if only one is defined in config (apache#4953)
  HDDS-8925. BaseFreonGenerator may not complete if last attempts fail (apache#4975)
  HDDS-7100. Container scanner incorrectly marks containers unhealthy when DN is shutdown (apache#4951)
  HDDS-8919. Allow EC pipelines to be created and then added to PipelineManager in two steps (apache#4968)
  HDDS-8901. Enable mTLS for InterSCMGrpcProtocol. (apache#4964)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/ContainerTestUtils.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants