Skip to content

HDDS-10414. Some acceptance tests fail with Docker Compose V2#6269

Merged
adoroszlai merged 8 commits intoapache:masterfrom
ArafatKhan2198:HDDS-10414
Mar 12, 2024
Merged

HDDS-10414. Some acceptance tests fail with Docker Compose V2#6269
adoroszlai merged 8 commits intoapache:masterfrom
ArafatKhan2198:HDDS-10414

Conversation

@ArafatKhan2198
Copy link
Contributor

What changes were proposed in this pull request?

Recon Acceptance test is failing for the robot test :-

hadoop-ozone/dist/src/main/smoketest/recon/recon-api.robot

The discrepancy arises because, in local Docker setups, DataNodes are named using hyphens, such as datanode-1, whereas in the upstream (CI) environment, DataNodes utilize underscores, like datanode_1. This variation in naming conventions leads to inconsistencies across different environments.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10414

How was this patch tested?

The acceptance test passed locally :-

------------------------------------------------------------------------------
Check if Recon picks up DN heartbeats                                 | PASS |
------------------------------------------------------------------------------
Check if Recon Web UI is up                                           | PASS |
------------------------------------------------------------------------------
Check web UI access                                                   | PASS |
------------------------------------------------------------------------------
Check admin only api access                                           | PASS |
------------------------------------------------------------------------------
Check unhealthy, (admin) api access                                   | PASS |
------------------------------------------------------------------------------
Check normal api access                                               | PASS |
------------------------------------------------------------------------------

And in the forked CI :- https://github.com/ArafatKhan2198/ozone/actions/runs/8030499503

@ArafatKhan2198
Copy link
Contributor Author

@dombizita @devmadhuu Could you please take a look.

@adoroszlai
Copy link
Contributor

@ArafatKhan2198 Please include information about the Docker (and Docker Compose) version where the test is failing.

@ArafatKhan2198
Copy link
Contributor Author

So my local docker versions are the following :- 

(base) mohammad.khan:ozone/ (HDDS-10324*) $ docker --version                                    
Docker version 24.0.5, build ced0996
(base) mohammad.khan:ozone/ (HDDS-10324*) $ docker-compose --version                            
Docker Compose version v2.20.2-desktop.1

@adoroszlai
Copy link
Contributor

@ArafatKhan2198 please check whether hadoop-ozone/dist/src/main/smoketest/topology/cli.robot is also failing with the same Docker version, it has similar host names in assertion:

hadoop-ozone/dist/src/main/smoketest/topology/cli.robot
29:                        Should contain   ${output}         10.5.0.7(ozone-topology_datanode_4_1.ozone-topology_net)    IN_SERVICE    /rack2
33:                        Should contain   ${output}         10.5.0.7(ozone-topology_datanode_4_1.ozone-topology_net) IN_SERVICE
36:                        Should contain   ${output}         10.5.0.7(ozone-topology_datanode_4_1.ozone-topology_net)    IN_SERVICE    /rack2
39:                        Should contain   ${output}         10.5.0.7(ozone-topology_datanode_4_1.ozone-topology_net)    IN_SERVICE    /rack2

@adoroszlai adoroszlai changed the title HDDS-10414. ozone acceptance: Recon.Recon-Api.Check if Recon picks up DN heartbeats: failed. HDDS-10414. Some acceptance tests fail with Docker Compose V2 Feb 29, 2024
@adoroszlai adoroszlai removed the recon label Feb 29, 2024
@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft February 29, 2024 11:27
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review March 11, 2024 09:12
@ArafatKhan2198
Copy link
Contributor Author

ArafatKhan2198 commented Mar 11, 2024

Hi @adoroszlai can you please take a look
The forked CI ran green :- https://github.com/ArafatKhan2198/ozone/actions/runs/8229346145

The hadoop-ozone/dist/src/main/smoketest/topology/cli.robot ran locally as well

(base) mohammad.khan:ozone/ (HDDS-10414*) $ ../test-single.sh om topology/cli.robot                                                                                                                                                                                                                     [13:21:04]
==============================================================================
Cli :: Smoketest ozone cluster startup                                        
==============================================================================
Run printTopology                                                     | PASS |
------------------------------------------------------------------------------
Run printTopology -o                                                  | PASS |
------------------------------------------------------------------------------
Run printTopology --operational-state IN_SERVICE                      | PASS |
------------------------------------------------------------------------------
Run printTopology --node-state HEALTHY                                | PASS |
------------------------------------------------------------------------------
Cli :: Smoketest ozone cluster startup                                | PASS |
4 tests, 4 passed, 0 failed
==============================================================================

@adoroszlai adoroszlai marked this pull request as draft March 11, 2024 13:41
@adoroszlai adoroszlai marked this pull request as ready for review March 11, 2024 13:41
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for the patch.

Comment on lines 28 to 30
Should Contain ${output} State = HEALTHY
Should Contain ${output} IN_SERVICE
Should Match Regexp ${output} .*datanode[-_]\\d+.*IN_SERVICE.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State = HEALTHY was not in the original expected content, why is it added?

Why is check for rack2 removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test case begins by checking for the State = HEALTHY output, ensuring that the overall state of the system or the datanodes is as expected.
All the commands start with this line.

sh-4.2$ ozone admin printTopology
State = HEALTHY
 172.19.0.8(ozone-datanode-4.ozone_default):HTTP=9882,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,RATIS_DATASTREAM=9855,STANDALONE=9859    IN_SERVICE    /default-rack
 172.19.0.10(ozone-datanode-1.ozone_default):HTTP=9882,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,RATIS_DATASTREAM=9855,STANDALONE=9859    IN_SERVICE    /default-rack
 172.19.0.7(ozone-datanode-5.ozone_default):HTTP=9882,CLIENT_RPC=19864,REPLICATION=9886,RATIS=9858,RATIS_ADMIN=9857,RATIS_SERVER=9856,RATIS_DATASTREAM=9855,STANDALONE=9859    IN_SERVICE    /default-rack

And I have added the check for rack again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to remove it since we want this test to work regardless of the health of DN's

Comment on lines 40 to 42
Should Contain ${output} State = HEALTHY
Should Contain ${output} IN_SERVICE
Should Match Regexp ${output} .*ozone.*datanode[-_]\\d+.*IN_SERVICE.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid duplicating assertions. Please extract a Keyword and use it in all test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@adoroszlai adoroszlai requested a review from devmadhuu March 12, 2024 10:43
@adoroszlai
Copy link
Contributor

Thanks @ArafatKhan2198 for updating the patch, latest patch looks good. Let's have someone with Docker Compose V2 also check it.

@Tejaskriya
Copy link
Contributor

Tejaskriya commented Mar 12, 2024

Tested this patch locally in docker, LGTM
Docker version on my system: Docker Compose version v2.18.1

$ ../test-single.sh om topology/cli.robot                                    
==============================================================================
Cli :: Smoketest ozone cluster startup                                        
==============================================================================
Run printTopology                                                     | PASS |
------------------------------------------------------------------------------
Run printTopology -o                                                  | PASS |
------------------------------------------------------------------------------
Run printTopology --operational-state IN_SERVICE                      | PASS |
------------------------------------------------------------------------------
Run printTopology --node-state HEALTHY                                | PASS |
------------------------------------------------------------------------------
Cli :: Smoketest ozone cluster startup                                | PASS |
4 tests, 4 passed, 0 failed
==============================================================================


$ ../test-single.sh om recon/recon-api.robot  
==============================================================================
Recon-Api :: Smoke test to start cluster with docker-compose environments.    
==============================================================================
Check if Recon picks up OM data                                       | PASS |
------------------------------------------------------------------------------
Check if Recon picks up DN heartbeats                                 | PASS |
------------------------------------------------------------------------------
Check if Recon Web UI is up                                           | FAIL |
'<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 403 Forbidden</title>
</head>
<body><h2>HTTP ERROR 403 Forbidden</h2>
<table>
<tr><th>URI:</th><td>/</td></tr>
<tr><th>STATUS:</th><td>403</td></tr>
<tr><th>MESSAGE:</th><td>Forbidden</td></tr>
<tr><th>SERVLET:</th><td>org.eclipse.jetty.servlet.DefaultServlet-5cfed0ba</td></tr>
</table>

</body>
</html>' does not contain 'Ozone Recon'
------------------------------------------------------------------------------
Check web UI access                                                   | FAIL |
'403' does not contain '200'
------------------------------------------------------------------------------
Check admin only api access                                           | PASS |
------------------------------------------------------------------------------
Check unhealthy, (admin) api access                                   | PASS |
------------------------------------------------------------------------------
Check normal api access                                               | PASS |
------------------------------------------------------------------------------
Recon-Api :: Smoke test to start cluster with docker-compose envir... | FAIL |
7 tests, 5 passed, 2 failed
==============================================================================

(I have some issues with recon UI working in my system so I think the other test failures are related to that)

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for working on the fix. Changes LGTM +1 and tested on following docker and docker-compose versions:
docker-compose --version
Docker Compose version v2.18.1
docker --version
Docker version 24.0.2, build cb74dfc

../test-single.sh om topology/cli.robot 
==============================================================================
Cli :: Smoketest ozone cluster startup                                        
==============================================================================
Run printTopology                                                     | PASS |
------------------------------------------------------------------------------
Run printTopology -o                                                  | PASS |
------------------------------------------------------------------------------
Run printTopology --operational-state IN_SERVICE                      | PASS |
------------------------------------------------------------------------------
Run printTopology --node-state HEALTHY                                | PASS |
------------------------------------------------------------------------------
Cli :: Smoketest ozone cluster startup                                | PASS |
4 tests, 4 passed, 0 failed
==============================================================================

../test-single.sh om recon/recon-api.robot
==============================================================================
Recon-Api :: Smoke test to start cluster with docker-compose environments.    
==============================================================================
Check if Recon picks up OM data                                       | PASS |
------------------------------------------------------------------------------
Check if Recon picks up DN heartbeats                                 | PASS |
------------------------------------------------------------------------------
Check if Recon Web UI is up                                           | PASS |
------------------------------------------------------------------------------
Check web UI access                                                   | PASS |
------------------------------------------------------------------------------
Check admin only api access                                           | PASS |
------------------------------------------------------------------------------
Check unhealthy, (admin) api access                                   | PASS |
------------------------------------------------------------------------------
Check normal api access                                               | PASS |
------------------------------------------------------------------------------
Recon-Api :: Smoke test to start cluster with docker-compose envir... | PASS |
7 tests, 7 passed, 0 failed
==============================================================================

@adoroszlai adoroszlai merged commit 91af26a into apache:master Mar 12, 2024
@adoroszlai
Copy link
Contributor

Thanks @ArafatKhan2198 for the patch, @devmadhuu, @Tejaskriya for the review.

adoroszlai pushed a commit to adoroszlai/ozone that referenced this pull request Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants