Skip to content

Optimize CachingLocalSegmentAllocator#getSequenceName#8909

Merged
jon-wei merged 8 commits intoapache:masterfrom
suneet-s:sequence-name-opt
Dec 24, 2019
Merged

Optimize CachingLocalSegmentAllocator#getSequenceName#8909
jon-wei merged 8 commits intoapache:masterfrom
suneet-s:sequence-name-opt

Conversation

@suneet-s
Copy link
Copy Markdown
Contributor

Fixes #8904

Description

Replace StringUtils#format with string addition to generate the sequence
name for an interval and partition. This is faster because format uses a
Matcher under the covers to replace the string format with the variables.

There is likely a performance gain here because this function is called for
every row in InputSourceProcessor#process


This PR has:

  • been self-reviewed.

@suneet-s
Copy link
Copy Markdown
Contributor Author

I didn't find any tests for this class. Since the logic is very trivial I decided to skip adding a unit test for this.

cc @jihoonson

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after CI. Thanks!

Copy link
Copy Markdown
Contributor

@ccaominh ccaominh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interval.toString() generates an ISO string, so I don't think it's affected by the format locale.

LGTM 👍

@jihoonson
Copy link
Copy Markdown
Contributor

Would you fix the style issue?

[ERROR] /home/travis/build/apache/incubator-druid/indexing-service/src/main/java/org/apache/druid/indexing/common/task/CachingLocalSegmentAllocator.java:30:8: Unused import - org.apache.druid.java.util.common.StringUtils. [UnusedImports]

Also, would you mind adding a comment why it's not using StringUtils.format()?

@suneet-s
Copy link
Copy Markdown
Contributor Author

Would you fix the style issue?

[ERROR] /home/travis/build/apache/incubator-druid/indexing-service/src/main/java/org/apache/druid/indexing/common/task/CachingLocalSegmentAllocator.java:30:8: Unused import - org.apache.druid.java.util.common.StringUtils. [UnusedImports]

Also, would you mind adding a comment why it's not using StringUtils.format()?

Will do! Chi's comment about toString() generating the correct format had me a little worried so I'm adding a simple unit test. Fighting with the styling guidelines right now. Should have another patch out within a day

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccaominh I copied most these helper initialization functions from your DefaultCaching...AllocatorTest in the refactoring PR

If you want I can consolidate them after your change is merged

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably good to consolidate.

@suneet-s
Copy link
Copy Markdown
Contributor Author

It appears that the EmitterTest is stuck

[INFO] Running org.apache.druid.java.util.emitter.core.EmitterTest
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

@ccaominh
Copy link
Copy Markdown
Contributor

I believe EmitterTest is a flaky test.

Suneet Saldanha and others added 5 commits December 16, 2019 17:05
Replace StringUtils#format with string addition to generate the sequence
name for an interval and partition. This is faster because format uses a
Matcher under the covers to replace the string format with the variables.
@suneet-s
Copy link
Copy Markdown
Contributor Author

It doesn't look like the failures are related to this change 🤔 @ccaominh @jihoonson could you take a look?

@ccaominh
Copy link
Copy Markdown
Contributor

The "license check" job needs to be retriggered (there was a failure when generating the license report).

I'm not sure why the "security vulnerabilities" job failed. It's flagging CVE-2018-10237, which has a CVSS of 5.9, but the job is configured to only fail on CVSS 7 or higher. The job may need to be retriggered as well.

}

@Test
public void test_getSequenceName_forIntervalAndRow_shouldUseISOFormatAndPartitionNumForRow()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests. I think we prefer to use a short but intuitive name for tests and add comments if necessary.

Copy link
Copy Markdown
Contributor Author

@suneet-s suneet-s Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you recommend a name? I like this format because it's explicit about what the test is doing

test_<functionUnderTest>_<condition>_<expectedResult>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would just use getSequenceName() for these tests with a comment for the expected value because they look simple enough to understand.

can you recommend a name? I like this format because it's explicit about what the test is doing

test_<functionUnderTest>_<condition>_<expectedResult>

I agree on that we could have a better format or a better practice for test names (we've been talking about a better practice for testing here). Unfortunately, we don't have such a format or practice and I think it could be better to make them consistent for now because we don't know what kind of policy we would adopt yet. I think it would be worth to start a discussion on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've renamed the tests to the less explicit format and left a comment for how I'd rename them in the future. When the tests pass, can you merge?

How do I start this discussion? The linked discussion is about a lot of different testing practices and I think driving consensus on such a large topic will take a long time. How do I start a discussion thread about naming practices. Who needs to approve the change to the naming format? And who has veto rights?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. You can start a discussion by starting a new thread on dev which is you already did (thanks!). There's no approval or veto in this kind of discussion. Anyone can participate in the discussion and express their opinion. The discussion could be concluded when the community accepts it.

@jnaous
Copy link
Copy Markdown
Contributor

jnaous commented Dec 18, 2019 via email

@jon-wei jon-wei merged commit dec619e into apache:master Dec 24, 2019
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace StringUtils#format in CachingLocalSegmentAllocator#getSequenceName

5 participants