Skip to content

Conversation

@xBis7
Copy link
Contributor

@xBis7 xBis7 commented Dec 5, 2022

What changes were proposed in this pull request?

In this patch we are exposing performance metrics in the /jmx and /prom endpoints, for using gRPC between OM and S3G clients. The metrics are similar to the ones gathered by hadoop RPC. Hadoop RPC uses a proxy server that monitors the socket and gathers the metrics based on it's traffic. We can achieve similar results using gRPC interceptors to get all the messages sent and received by the server and monitor the number of active client connections using gRPC transport filter.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7517

How was this patch tested?

We can't have unit tests without adding a pom dependency to OzoneManager or S3Gateway. Furthermore, we can't have an integration test with Grpc enabled because Mini Ozone Cluster tries to configure Grpc server ports on multiple OMs with the same port causing binding errors.

In order to test this patch, a new acceptance test was added. This acceptance test gets executed only if grpc is enabled.

This patch can also be tested manually in a docker environment after setting the configuration for grpc. We can also enable the new percentiles flag ozone.grpc.metrics.percentiles.intervals. Add in docker-config under
/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/compose/ozone

OZONE-SITE.XML_ozone.om.s3.grpc.server_enabled=true
OZONE-SITE.XML_ozone.om.transport.class=org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory
docker-compose up --scale datanode=3 -d
docker exec -it ozone_s3g_1 bash
bash-4.2$ aws configure
AWS Access Key ID [****************test]: test
AWS Secret Access Key [****************pass]: pass
Default region name [None]: 
Default output format [None]: 
bash-4.2$ aws s3api --endpoint http://localhost:9878 create-bucket --bucket bucket1
{
    "Location": "http://localhost:9878/bucket1"
}
bash-4.2$ aws s3api --endpoint http://localhost:9878 put-object --bucket bucket1 --key dir1/key1 --body README.md

And then check /jmx and /prom endpoints for the om. For example in /jmx we have something like

{
    "name" : "Hadoop:service=OzoneManager,name=GrpcMetrics",
    "modelerType" : "GrpcMetrics",
    "tag.LatestRequestType" : "CommitKey",
    "tag.Hostname" : "27fbfa05d584",
    "SentBytes" : 1093,
    "ReceivedBytes" : 2173,
    "UnknownMessagesSent" : 0,
    "UnknownMessagesReceived" : 0,
    "GrpcQueueTimeNumOps" : 8,
    "GrpcQueueTimeAvgTime" : 969339.5,
    "GrpcProcessingTimeNumOps" : 8,
    "GrpcProcessingTimeAvgTime" : 2.98469375E7,
    "NumOpenClientConnections" : 1
},

@kerneltime
Copy link
Contributor

@GeorgeJahad @duongkame Please take a look at this PR

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a big work to enable perf metrics for GRPC. Thanks @xBis7 for the contribution.
It looks good overall, yet I think we can make the metric component reusable, e.g. for SCM, and some concerns about efficiency.

@xBis7
Copy link
Contributor Author

xBis7 commented Dec 20, 2022

@duongkame Could you take another look at this?

@kerneltime
Copy link
Contributor

@duongkame can you take a look again?

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just put a few minor comments.

@xBis7
Copy link
Contributor Author

xBis7 commented Jan 11, 2023

@duongkame I made the changes you suggested and some more improvements to make everything simpler. Could you take another look?

@neils-dev Could you take a look as well and see if we can close this if there are no more comments?

@prashantpogde prashantpogde self-requested a review January 19, 2023 17:56
@neils-dev neils-dev added the gr label Jan 26, 2023
@neils-dev
Copy link
Contributor

@kerneltime can you take a look at this PR as well. Thanks.

grpcMetrics.addGrpcQueueTime(queueTime);

// set metrics processing time
grpcMetrics.addGrpcProcessingTime(processingTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you attach a label for which request was received?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerneltime I added the tag. That's how it looks, "tag.LatestRequestType" : "CommitKey".

{
    "name" : "Hadoop:service=OzoneManager,name=GrpcMetrics",
    "modelerType" : "GrpcMetrics",
    "tag.LatestRequestType" : "CommitKey",
    "tag.Hostname" : "27fbfa05d584",
    "SentBytes" : 1093,
    "ReceivedBytes" : 2173,
    "UnknownMessagesSent" : 0,
    "UnknownMessagesReceived" : 0,
    "GrpcQueueTimeNumOps" : 8,
    "GrpcQueueTimeAvgTime" : 969339.5,
    "GrpcProcessingTimeNumOps" : 8,
    "GrpcProcessingTimeAvgTime" : 2.98469375E7,
    "NumOpenClientConnections" : 1
  },

Let me know what you think.

Copy link
Contributor

@tanvipenumudy tanvipenumudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me @xBis7, thanks.

@xBis7
Copy link
Contributor Author

xBis7 commented Apr 14, 2023

@tanvipenumudy Thanks for reviewing this PR.

@xBis7
Copy link
Contributor Author

xBis7 commented Apr 21, 2023

@kerneltime Can you take another look on this PR and help get it merged?

@adoroszlai
Copy link
Contributor

@kerneltime @neils-dev can you please check if all your comments are addressed, so we can merge this?

@xBis7
Copy link
Contributor Author

xBis7 commented May 23, 2023

can you please check if all your comments are addressed, so we can merge this?

@adoroszlai All comments have been addressed.

@adoroszlai adoroszlai merged commit 45223ca into apache:master May 23, 2023
@adoroszlai
Copy link
Contributor

Thanks @xBis7 for continued efforts on this. Thanks @duongkame, @kerneltime, @neils-dev, @tanvipenumudy for the reviews.

@xBis7
Copy link
Contributor Author

xBis7 commented May 23, 2023

Thanks @adoroszlai for merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants