Skip to content

Speed up FrameFileTest, SuperSorterTest.#17068

Merged
gianm merged 3 commits intoapache:masterfrom
gianm:test-perf-ff-ss
Sep 16, 2024
Merged

Speed up FrameFileTest, SuperSorterTest.#17068
gianm merged 3 commits intoapache:masterfrom
gianm:test-perf-ff-ss

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Sep 15, 2024

These are two heavily parameterized tests that, together, account for about 60% of runtime in the druid-processing test suite.

FrameFileTest changes:

  1. Cache frame files in a static, rather than building the frame file
    for each parameterization of the test.

  2. Adjust TestArrayCursorFactory to cache the signature, rather than
    re-creating it on each call to getColumnCapabilities.

SuperSorterTest changes:

  1. Dramatically reduce the number of tests that run with
    "maxRowsPerFrame" = 1. These are particularly slow due to writing so
    many small files. Some still run, since it's useful to test edge cases,
    but much fewer than before.

  2. Reduce the "maxActiveProcessors" axis of the test from [1, 2, 4] to
    [1, 3]. The aim is to reduce the number of cases while still getting
    good coverage of the feature.

  3. Reduce the "maxChannelsPerProcessor" axis of the test from [2, 3, 8]
    to [2, 7]. The aim is to reduce the number of cases while still getting
    good coverage of the feature.

  4. Use in-memory input channels rather than file channels.

  5. Defer formatting of assertion failure messages until they are needed.

  6. Cache the cursor factory and its signature in a static.

  7. Cache sorted test rows (used for verification) in a static.

These are two heavily parameterized tests that, together, account for
about 60% of runtime in the test suite.

FrameFileTest changes:

1) Cache frame files in a static, rather than building the frame file
   for each parameterization of the test.

2) Adjust TestArrayCursorFactory to cache the signature, rather than
   re-creating it on each call to getColumnCapabilities.

SuperSorterTest changes:

1) Dramatically reduce the number of tests that run with
   "maxRowsPerFrame" = 1. These are particularly slow due to writing so
   many small files. Some still run, since it's useful to test edge cases,
   but much fewer than before.

2) Reduce the "maxActiveProcessors" axis of the test from [1, 2, 4] to
   [1, 3]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

3) Reduce the "maxChannelsPerProcessor" axis of the test from [2, 3, 8]
   to [2, 7]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

4) Use in-memory input channels rather than file channels.

5) Defer formatting of assertion failure messages until they are needed.

6) Cache the cursor factory and its signature in a static.

7) Cache sorted test rows (used for verification) in a static.
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Sep 15, 2024

CI just finished, the change appears to chop an hour off the processing unit test time.

@gianm gianm merged commit 5b7fb5f into apache:master Sep 16, 2024
@gianm gianm deleted the test-perf-ff-ss branch September 16, 2024 00:03
pranavbhole pushed a commit to pranavbhole/druid that referenced this pull request Sep 17, 2024
* Speed up FrameFileTest, SuperSorterTest.

These are two heavily parameterized tests that, together, account for
about 60% of runtime in the test suite.

FrameFileTest changes:

1) Cache frame files in a static, rather than building the frame file
   for each parameterization of the test.

2) Adjust TestArrayCursorFactory to cache the signature, rather than
   re-creating it on each call to getColumnCapabilities.

SuperSorterTest changes:

1) Dramatically reduce the number of tests that run with
   "maxRowsPerFrame" = 1. These are particularly slow due to writing so
   many small files. Some still run, since it's useful to test edge cases,
   but much fewer than before.

2) Reduce the "maxActiveProcessors" axis of the test from [1, 2, 4] to
   [1, 3]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

3) Reduce the "maxChannelsPerProcessor" axis of the test from [2, 3, 8]
   to [2, 7]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

4) Use in-memory input channels rather than file channels.

5) Defer formatting of assertion failure messages until they are needed.

6) Cache the cursor factory and its signature in a static.

7) Cache sorted test rows (used for verification) in a static.

* It helps to include the file.

* Style.
@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Sep 30, 2024

Need to backport this to allow backporting #17088.
Otherwise, there are some merge conflicts.

@kfaraz kfaraz added this to the 31.0.0 milestone Sep 30, 2024
kfaraz pushed a commit to kfaraz/druid that referenced this pull request Sep 30, 2024
* Speed up FrameFileTest, SuperSorterTest.

These are two heavily parameterized tests that, together, account for
about 60% of runtime in the test suite.

FrameFileTest changes:

1) Cache frame files in a static, rather than building the frame file
   for each parameterization of the test.

2) Adjust TestArrayCursorFactory to cache the signature, rather than
   re-creating it on each call to getColumnCapabilities.

SuperSorterTest changes:

1) Dramatically reduce the number of tests that run with
   "maxRowsPerFrame" = 1. These are particularly slow due to writing so
   many small files. Some still run, since it's useful to test edge cases,
   but much fewer than before.

2) Reduce the "maxActiveProcessors" axis of the test from [1, 2, 4] to
   [1, 3]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

3) Reduce the "maxChannelsPerProcessor" axis of the test from [2, 3, 8]
   to [2, 7]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.

4) Use in-memory input channels rather than file channels.

5) Defer formatting of assertion failure messages until they are needed.

6) Cache the cursor factory and its signature in a static.

7) Cache sorted test rows (used for verification) in a static.

* It helps to include the file.

* Style.
kfaraz added a commit that referenced this pull request Sep 30, 2024
These are two heavily parameterized tests that, together, account for
about 60% of runtime in the test suite.

FrameFileTest changes:
1) Cache frame files in a static, rather than building the frame file
   for each parameterization of the test.
2) Adjust TestArrayCursorFactory to cache the signature, rather than
   re-creating it on each call to getColumnCapabilities.

SuperSorterTest changes:
1) Dramatically reduce the number of tests that run with
   "maxRowsPerFrame" = 1. These are particularly slow due to writing so
   many small files. Some still run, since it's useful to test edge cases,
   but much fewer than before.
2) Reduce the "maxActiveProcessors" axis of the test from [1, 2, 4] to
   [1, 3]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.
3) Reduce the "maxChannelsPerProcessor" axis of the test from [2, 3, 8]
   to [2, 7]. The aim is to reduce the number of cases while still getting
   good coverage of the feature.
4) Use in-memory input channels rather than file channels.
5) Defer formatting of assertion failure messages until they are needed.
6) Cache the cursor factory and its signature in a static.
7) Cache sorted test rows (used for verification) in a static.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants