Skip to content

Conversation

@xichen01
Copy link
Contributor

@xichen01 xichen01 commented Feb 6, 2023

What changes were proposed in this pull request?

  • Add a tool to measure the Ozone om performance in Ozone freon
  • Support Ozone freon runs for a specified duration of time.
  • Support real-time display of QPS and MAX QPS for Ozone metadata test
  • Support display test arguments in the test report

GIF:
a MIXED operation, include CREATE_FILE and LOOKUP_FILE and LIST_STATUS, test duration time is 10s

ozone freon ommg --operation MIXED --ops CREATE_FILE,LOOKUP_FILE,LIST_STATUS --opsnum 5,4,1 -t 10 -n 1000 --runtime 5 --timebase

Jan-19-2023 19-21-23

[root@Linux /root/ozone]% ozone freon ommg
A tool to measure the Ozone om performance
support Operation:
  CREATE_FILE, LOOKUP_FILE, READ_FILE, LIST_STATUS, CREATE_KEY, LOOKUP_KEY, HEAD_KEY, READ_KEY, LIST_KEYS, INFO_BUCKET, INFO_VOLUME, MIXED

Example:
# create 25000 keys, run time 180s
$ bin/ozone freon ommg --operation CREATE_KEY -n 25000 --runtime 180 --timebase

# read 25000 keys, run time 180s
$ bin/ozone freon ommg --operation READ_KEY -n 25000 --runtime 180 --timebase

# 20 threads, list 1000 keys each request, and run time 180s
$ bin/ozone freon ommg --operation LIST_KEYS -t 20 --batch-size 1000 --runtime 180 --timebase

# 10 threads, 1 threads list keys, 5 threads create file, 4 threads list lookup file and run time 180s
$ bin/ozone freon ommg --operation MIXED --ops CREATE_FILE,LOOKUP_FILE,LIST_STATUS --opsNums 5,4,1 -t 10 -n 1000  --runtime 180 --timebase

Note that: you must create a sufficient number of objects before executing read-related tests

[root@Linux /root/ozone]%

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7908

How was this patch tested?

@xichen01
Copy link
Contributor Author

xichen01 commented Feb 7, 2023

The command output example

[root@centos root]$ ozone freon ommg --operation MIXED --ops CREATE_FILE,LOOKUP_FILE,LIST_STATUS --opsnum 5,4,1 -t 10s -n 1000 --runtime 5 --timebase --verbose

 100.00% |██████████████████████████████████|  5/5 Time: 0:00:05|   CREATE_FILE: rate 76 max 88 LIST_STATUS: rate 3 max 12 LOOKUP_FILE: rate 453 max 843
2/7/23 1:15:14 PM ==============================================================

-- Timers ----------------------------------------------------------------------
CREATE_FILE
             count = 265
         mean rate = 57.17 calls/second
     1-minute rate = 0.00 calls/second
     5-minute rate = 0.00 calls/second
    15-minute rate = 0.00 calls/second
               min = 1.16 milliseconds
               max = 710.05 milliseconds
              mean = 73.13 milliseconds
            stddev = 138.07 milliseconds
            median = 5.29 milliseconds
              75% <= 9.02 milliseconds
              95% <= 299.71 milliseconds
              98% <= 307.26 milliseconds
              99% <= 709.11 milliseconds
            99.9% <= 710.05 milliseconds
LIST_STATUS
             count = 22
         mean rate = 4.29 calls/second
     1-minute rate = 4.40 calls/second
     5-minute rate = 4.40 calls/second
    15-minute rate = 4.40 calls/second
               min = 270.81 milliseconds
               max = 487.88 milliseconds
              mean = 383.47 milliseconds
            stddev = 94.26 milliseconds
            median = 327.36 milliseconds
              75% <= 484.02 milliseconds
              95% <= 487.43 milliseconds
              98% <= 487.88 milliseconds
              99% <= 487.88 milliseconds
            99.9% <= 487.88 milliseconds
LOOKUP_FILE
             count = 2308
         mean rate = 497.17 calls/second
     1-minute rate = 0.00 calls/second
     5-minute rate = 0.00 calls/second
    15-minute rate = 0.00 calls/second
               min = 0.33 milliseconds
               max = 298.57 milliseconds
              mean = 8.25 milliseconds
            stddev = 44.11 milliseconds
            median = 0.66 milliseconds
              75% <= 0.85 milliseconds
              95% <= 1.70 milliseconds
              98% <= 252.02 milliseconds
              99% <= 266.08 milliseconds
            99.9% <= 298.50 milliseconds


Total execution time (sec): 6
Failures: 0
Successful executions: 2595

Option:
--number-of-tests=1000
--threads=10
--timebase=true
--runtime=5
--fail-at-end=false
--prefix=tfae0qxzwe
--verbose=true
--volume=vol1
--bucket=bucket1
--size=0
--buffer=4096
--batch-size=1000
--random=false
--operation=LOOKUP_FILE
--ops=CREATE_FILE,LOOKUP_FILE,LIST_STATUS
--opsnum=5,4,1
--ophelp=false
--om-service-id=null
--help=false
--version=false

@xichen01
Copy link
Contributor Author

xichen01 commented Feb 7, 2023

There are some features in this PR, should I split these into multiple PRs, the review will be easier.
such as:
this PR:

  • Add a tool to measure the Ozone om performance in Ozone freon
  • Support real-time display of QPS and MAX QPS for Ozone metadata test

another PR:

  • Support Ozone freon runs for a specified duration of time.
  • Support display test arguments in the test report

@jojochuang
Copy link
Contributor

No this is okay. I'll review. Thanks.
One suggestion though: some of the operations under test are not pure metadata operations, such as READ_FILE and READ_KEY. Some of the operations may duplicate existing freon tools.

In any case, I think this is nice too to have. Being able to generate random-like workloads is great.

cc: @DaveTeng0

@xichen01
Copy link
Contributor Author

xichen01 commented Feb 7, 2023

No this is okay. I'll review. Thanks. One suggestion though: some of the operations under test are not pure metadata operations, such as READ_FILE and READ_KEY. Some of the operations may duplicate existing freon tools.

In any case, I think this is nice too to have. Being able to generate random-like workloads is great.

cc: @DaveTeng0

By default, this is the operation of pure metadata in the process of reading and writing
By default, READ_FILE and READ_KEY, WRITE_FILE and WRITE_KEY data size is 0, so will not read/write data from DN, and will only access directly with the OM(the OM will access SCM but will not allocate block). This can be understood as the maximum read/write QPS that OM can achieve when there are infinite DNs.

In this freon sub command, the MIXED operation is possible, such as we can test:
Read: Write 7:3
ozone freon ommg --operation MIXED --ops CREATE_FILE,READ_FILE --opsnum 3,7 -t 10 -n 1000 --runtime 180 --timebase
Read: Write 5:5
ozone freon ommg --operation MIXED --ops CREATE_FILE,READ_FILE --opsnum 5,5 -t 10 -n 1000 --runtime 180 --timebase
....

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xichen01 for the patch. I'll need to take another look at OmMetadataGenerator, but here is my first round of review for the existing Freon parts.

Comment on lines +551 to +552
public long getThreadSequenceId() {
return threadSequenceId.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc comment explaining purpose of this new sequence ID.

Comment on lines 573 to 574
* Get current Thread sequence ID.
* @return Current Thread sequence ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment, but I still don't think it's clear right away what it is used for. The doc just repeats the method name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been updated

@adoroszlai adoroszlai dismissed their stale review February 28, 2023 18:00

comments addressed

@adoroszlai
Copy link
Contributor

Got this for MIXED workload from command help:

IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131)
	at org.apache.hadoop.ozone.om.KeyManagerImpl.listStatus(KeyManagerImpl.java:1453)

@xichen01
Copy link
Contributor Author

xichen01 commented Mar 1, 2023

Got this for MIXED workload from command help:

IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131)
	at org.apache.hadoop.ozone.om.KeyManagerImpl.listStatus(KeyManagerImpl.java:1453)

This has been fixed

Those command can normally execution

ozone freon ommg --operation CREATE_FILE -n 25000 --duration  180s
ozone freon ommg --operation MIXED --ops CREATE_FILE,LOOKUP_FILE,LIST_STATUS --opsnum 5,4,1 -t 10 -n 1000 --duration 180s

@adoroszlai adoroszlai requested a review from jojochuang March 12, 2023 17:30
@adoroszlai adoroszlai requested a review from duongkame March 20, 2023 21:12
@adoroszlai
Copy link
Contributor

@duongkame @jojochuang please review

@adoroszlai adoroszlai merged commit cffa386 into apache:master Apr 17, 2023
@adoroszlai
Copy link
Contributor

Thanks @xichen01 for the patch, @jojochuang for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Apr 20, 2023
* master: (440 commits)
  HDDS-8445. Move PlacementPolicy back to SCM (apache#4588)
  HDDS-8335. ReplicationManager: EC Mis and Under replication handlers should handle overloaded exceptions (apache#4593)
  HDDS-8355. Intermittent failure in TestOMRatisSnapshots#testInstallSnapshot (apache#4592)
  HDDS-8444. Increase timeout of CI build (apache#4586)
  HDDS-8446. Selective checks: handle change in ci.yaml (apache#4587)
  HDDS-8440. Ozone Manager crashed with ClassCastException when deleting FSO bucket. (apache#4582)
  HDDS-7309. Enable by default GRPC between S3G and OM (apache#3820)
  HDDS-8458. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8385. Ozone can't process snapshot when service UID > 2097151 (apache#4580)
  HDDS-8424: Preserve legacy bucket getKeyInfo behavior (apache#4576)
  HDDS-8453. Mark TestDirectoryDeletingServiceWithFSO#testDirDeletedTableCleanUpForSnapshot as flaky
  HDDS-8137. [Snapshot] SnapDiff to use tombstone entries in SST files (apache#4376)
  HDDS-8270. Measure checkAccess latency for Ozone objects (apache#4467)
  HDDS-8109. Seperate Ratis and EC MisReplication Handling (apache#4577)
  HDDS-8429. Checkpoint is not closed properly in OMDBCheckpointServlet (apache#4575)
  HDDS-8253. Set ozone.metadata.dirs to temporary dir if not defined in S3 Gateway (apache#4455)
  HDDS-8400. Expose rocksdb last sequence number through metrics (apache#4557)
  HDDS-8333. ReplicationManager: Allow partial EC reconstruction if insufficient nodes available (apache#4579)
  HDDS-8147. Introduce latency metrics for S3 Gateway operations (apache#4383)
  HDDS-7908. Support OM Metadata operation Generator in `Ozone freon` (apache#4251)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants