Skip to content

Fixed the non deterministic test testSerdeWithInputFormat#17325

Closed
yugvajani wants to merge 1 commit intoapache:masterfrom
yugvajani:fix-flaky-test
Closed

Fixed the non deterministic test testSerdeWithInputFormat#17325
yugvajani wants to merge 1 commit intoapache:masterfrom
yugvajani:fix-flaky-test

Conversation

@yugvajani
Copy link
Copy Markdown

@yugvajani yugvajani commented Oct 10, 2024

Fixed a non deterministic test in org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest.testSerdeWithInputFormat

Steps to reproduce

To reproduce the problem, first build the module kafka-indexing-service:

mvn install -pl extensions-core/kafka-indexing-service -am -DskipTests

Then, run the regular test:

mvn -pl extensions-core/kafka-indexing-service test -Dtest=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest#testSerdeWithInputFormat

To identify the flaky test, execute the following nondex command:

mvn -pl extensions-core/kafka-indexing-service edu.illinois:nondex-maven-plugin:2.1.7:nondex -Dtest=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest#testSerdeWithInputFormat

Description

The testSerdeWithInputFormat was producing non-deterministic output because the JSON object used in the test did not include a parser that enforced a specific order, unlike other tests. Due to the inherent non-deterministic nature of JSON serialization, this resulted in inconsistent outputs. The dimension spec array gave non deterministic order on different runs as follow-
expected: "dimensionExclusions":["__time","value_max","count","value_min","value_sum","value]","timestamp"]
actual: "dimensionExclusions":["value_sum","value","count","__time","value_min","value_max]","timestamp"] .

The fix involves adding a parser to the JSON like the other tests to ensure a consistent order.



This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

KafkaSupervisorSpec spec2 = mapper.readValue(serialized, KafkaSupervisorSpec.class);

String stable = mapper.writeValueAsString(spec2);
String sortedStableJson = sortJsonString(stable);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this seems like an unconditional resort at the Json level for only this test; but I wonder if it only affects this
from the repro steps: nondex have detected that the output here is unstable.

I don't know the Kafka part; but there might be lists/arrays for which order does matter - sorting all parts of a json unconditionally may possible hide issues.

I wonder if it was investigated what causes these mismatches. I suspect that some underlying class may use some unstable constructs - in most cases these used to boil down to a HashSet -> LinkedHashSet change...

Another possible way to fix this in a wider scope is to make the KafkaSupervisorSpec comparable and do the validation at that level; so that the real classes can decide if 2 maps/sets are equal or not ; and lists / arrays will still be validated correctly

what do you think about these approaches?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an underlying class were using HashSet, other tests in the same file would likely have failed as well, since they all use the same underlying class. The key difference with testSerdeWithInputFormat is that it wasn’t using a parser, which is probably what makes the list order deterministic in the other tests.

I tried adding the parser to this test, and while it did fix the order, I ran into an issue where Assert.assertEquals(4, spec.getDataSchema().getAggregators().length); returned a length of 0. I'm not entirely sure why, as I’m not very familiar with the Kafka-specific logic.

The sort function I created could be used as a general-purpose tool to ensure deterministic ordering across tests, if needed. In fact, most of the tests only rely on assert.contains, except for one AssertEquals. Since the sort function only arranges the list elements and doesn’t alter their content, it shouldn’t cause problems, especially since the parser itself was handling ordering in other cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point to a test failure when this happened without instrumentation?

The sort function I created could be used as a general-purpose tool to ensure deterministic ordering across tests

I don't think that would be a good path forward - it reorders arrays without thinking about that may semantically be different...its ok to reorder if the array is say argument to an IN ; but if they are arguments to a function they may mean a different thing in a different order.

I think to fix these things for sure - the source should be addressed and not add extra lenient comparision stuff to the tests.

I'm not sure I was able to find the answers for my previous questions above:

  • could you point to the field which's comparision is the root cause of these failures? are there any options there to alter the production code to be more stable?
  • have you considered/evaluated making the KafkaSupervisorSpec/etc comparable - so that instead of resorting to string comparision application level comparision can be made?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you point to the field which's comparision is the root cause of these failures? are there any options there to alter the production code to be more stable? - The output of dimensionSpec was non-deterministic because the parser wasn’t added to dimensionSpec, which led to ordering issues. I’ve implemented a fix by adding a parser, similar to how other tests are set up, which resolves the non-deterministic ordering problem. I’ve updated the PR with this fix—please review it and let me know if it looks good. Now, the array's ordering remains unchanged, but the dimension order is now deterministic.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgyrtkirk Do you have some progress on inspecting this? I have a similar change for org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest.testSerdeWithSpecAndInputFormat and don't know whether to add to this PR or open another PR. Thanks.

@yugvajani yugvajani changed the title Fixed the flaky test testSerdeWithInputFormat by sorting the result Fixed the non deterministic test testSerdeWithInputFormat Oct 28, 2024
@kgyrtkirk
Copy link
Copy Markdown
Member

I'm a little puzzled seeing that the parser seem to have been deprecated ; and the PR which did introduced timespampSpec was the one deprecating that in #8823.
If the fix would be to move from a deprecated one to a non-deprecated one I wouldn't had any questions/etc...but its the other way around

do you know how the inconsistency could happen with the non-deprecated codepath? the comparision of which field leads to the issue?
I still feel like this is just masking the issue and not stabilizes the non-deprecated system to be "stable" - so that it produces the same results with different java versions/etc.

@yugvajani
Copy link
Copy Markdown
Author

yugvajani commented Dec 16, 2024

@kgyrtkirk Initially, I had resolved the issue by sorting the contents of the JSON. However, since the ordering of elements is important, I later updated the solution to align with how other tests handled ordering. The parser we were using provided a deterministic order, which ensured the test passed.

Now that the parser is deprecated, one possible solution could be to avoid relying on string equality for comparisons and instead perform a direct JSON comparison. What is your opinion on this one?

@kgyrtkirk
Copy link
Copy Markdown
Member

I think this is not a test-only issue; its more like the system fails to return stable results

I think it should be tracked down how the inconsistency could happen with the non-deprecated codepath.

Seems like the comparision of dimensionExclusions fails? where is that field declared? what typed object it gets?

you will be most likely be able stabilize the test by making changes like HashSet to LinkedHashSet or something similar - you may also need to update the test; but after that it will not happen anymore.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions Bot added the stale label Feb 16, 2025
@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants