Skip to content

MLE-28401: Add missing guard for logging retry message#591

Closed
yunzvanessa wants to merge 294 commits intomainfrom
MLE-28401-fix-retry-logging
Closed

MLE-28401: Add missing guard for logging retry message#591
yunzvanessa wants to merge 294 commits intomainfrom
MLE-28401-fix-retry-logging

Conversation

@yunzvanessa
Copy link
Copy Markdown
Contributor

The "Retrying committing batch" debug message was logged on every batch, even the first attempt, producing misleading output that suggested retries were happening when they weren't. Added a commitRetry > 0 guard to match the existing pattern in the parent class TransformWriter.

This issue surfaced when investigating and mlcp regression failure: https://progresssoftware.atlassian.net/browse/MLE-28401

builder and others added 26 commits March 29, 2026 00:08
…rnal Entity Polaris medium severity issues (#577)

* MLE-27554 MLE-27556: Fix XXE Injection - XML External Entity related to XMLInputFactory

* MLE-27553 MLE-27555: Fix XXE Injection - XML External Entity related to TransformerFactory

* MLE-27553 MLE-27554 MLE-27555 MLE-27556: Split XXE-prevention try/catch blocks and improve warning messages
MLE-27557: Fix weak crypto - replace insecure TLSv1 with TLSv1.3 in ContentReader.java
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ss spurious wakeup vulnerability

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MLE-26865, MLE-26867: Fixes for issues identified by Polaris
…om-develop

PDP-1182: Remove per-repo pr-workflow.yaml
Copilot AI review requested due to automatic review settings April 14, 2026 20:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses misleading “Retrying committing batch” debug logging by adding a commitRetry > 0 guard, and it also includes a broad set of additional robustness/security/concurrency updates across mlcp and the MapReduce connector.

Changes:

  • Guard retry-specific debug logging so it only appears on actual retry attempts.
  • Add defensive checks (null/empty config values, filesystem edge cases, safer iteration over server responses) and harden XML processing against XXE.
  • Refactor local multithreaded execution coordination (wait/notify → CountDownLatch) and bump project/dependency versions (plus CI/workflow/config updates).

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/main/java/com/marklogic/tree/ExpandedTree.java Avoid repeated string computation in trace logging path (minor refactor).
src/main/java/com/marklogic/mapreduce/utilities/InternalUtilities.java More defensive handling of INPUT_HOST configuration.
src/main/java/com/marklogic/mapreduce/test/FCheck.java Guard against File.listFiles() returning null.
src/main/java/com/marklogic/mapreduce/examples/HelloWorld.java Null-safe handling of document element when extracting text.
src/main/java/com/marklogic/mapreduce/examples/ContentReader.java Update SSL example defaults (protocols/ciphers) and validate output path.
src/main/java/com/marklogic/mapreduce/JSONDocument.java Configure TransformerFactory to restrict external DTD/stylesheet access (XXE hardening).
src/main/java/com/marklogic/mapreduce/ForestReader.java Null-check parent directory path before building related file paths.
src/main/java/com/marklogic/mapreduce/DOMDocument.java Configure TransformerFactory to restrict external DTD/stylesheet access (XXE hardening) and add null guards.
src/main/java/com/marklogic/mapreduce/ContentOutputFormat.java Validate output hosts and add safer result iteration (hasNext() checks).
src/main/java/com/marklogic/dom/NodeImpl.java Prevent NPE when parent node is null; make isEqualNode local-name compare null-safe.
src/main/java/com/marklogic/dom/DocumentImpl.java Null-safe namespace lookup/prefix lookup when document element is missing.
src/main/java/com/marklogic/dom/CommentImpl.java Null-safe namespace lookup delegation when parent is missing.
src/main/java/com/marklogic/dom/CharacterDataImpl.java Null-safe length/substring handling for character data.
src/main/java/com/marklogic/dom/AttrImpl.java Null-safe namespace lookup delegation when owner element is missing.
src/main/java/com/marklogic/contentpump/utilities/PermissionUtil.java Validate OUTPUT_HOST presence before creating output ContentSource.
src/main/java/com/marklogic/contentpump/utilities/JSONDocBuilder.java Guard against null exception messages when filtering “missing value”.
src/main/java/com/marklogic/contentpump/TransformWriter.java Null-safe handling of exception message when detecting “Module Not Found”.
src/main/java/com/marklogic/contentpump/TransformOutputFormat.java Validate OUTPUT_HOST presence before querying mimetypes.
src/main/java/com/marklogic/contentpump/ThreadManager.java Replace wait/notify coordination with latch; validate output hosts; minor control-flow guard.
src/main/java/com/marklogic/contentpump/SingleDocumentWriter.java Null-check parent directory before mkdirs().
src/main/java/com/marklogic/contentpump/RDFReader.java Safer permissions initialization; validate hosts; add result iteration guards.
src/main/java/com/marklogic/contentpump/OutputArchive.java Null-check parent directory before mkdirs().
src/main/java/com/marklogic/contentpump/MultithreadedMapper.java Introduce latch-based signaling when runners are created.
src/main/java/com/marklogic/contentpump/LocalJobRunner.java Wire latch into MultithreadedMapper tasks; ensure latch is released on failure.
src/main/java/com/marklogic/contentpump/InputType.java Validate output hosts; add ResultSequence hasNext() guards for role map query.
src/main/java/com/marklogic/contentpump/DatabaseTransformWriter.java Add commitRetry > 0 guard to retry log message.
src/main/java/com/marklogic/contentpump/DatabaseContentReader.java Add ResultSequence hasNext() guards; null-check W3C permission element conversion.
src/main/java/com/marklogic/contentpump/ContentPump.java Guard against listFiles() returning null when building lib jar list.
src/main/java/com/marklogic/contentpump/CompressedAggXMLReader.java Disable external entities/DTDs on XMLInputFactory (XXE hardening).
src/main/java/com/marklogic/contentpump/AggregateXMLReader.java Disable external entities/DTDs on XMLInputFactory (XXE hardening).
pom.xml Bump project version and Avro dependency version.
.github/workflows/pr-workflow.yaml Remove JIRA ID validator workflow.
.copyrightconfig Add copyright validation config and exclusions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +179 to 185
String atomStr = atomString(textReps[index]);
LOG.trace("atom " + textReps[index] + " [" +
atomString(textReps[index]) + "] length " +
atomString(textReps[index]).length());
atomStr + "] length " +
(atomStr != null ? atomStr.length() : "0"));
}
buf.append(atomString(textReps[index++]));
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomString(textReps[index]) is now computed into atomStr for logging, but buf.append(atomString(...)) still recomputes it. Reuse atomStr for the append too to avoid duplicate work and to ensure the trace output matches the actual appended value if atomString is non-trivial/non-deterministic.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to 113
String[] hosts = conf.getStrings(INPUT_HOST);
if (hosts == null || hosts.length == 0) {
throw new IllegalArgumentException(INPUT_HOST +
" is not specified.");
}

return getInputContentSource(conf, host);
return getInputContentSource(conf, hosts[0]);
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change now checks for hosts == null || hosts.length == 0, but it no longer validates that hosts[0] is non-empty/blank. If INPUT_HOST is set to an empty string, getInputContentSource(conf, hosts[0]) will still be called and will likely fail later with a less clear error; consider restoring an explicit empty/blank check for hosts[0] here.

Copilot uses AI. Check for mistakes.
}
String data = getData();
if (data == null) {
throw new DOMException(DOMException.DOMSTRING_SIZE_ERR, null);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

substringData now throws DOMSTRING_SIZE_ERR when getData() returns null, but getLength() treats null data as length 0. This makes behavior inconsistent and introduces a new exception path for callers; consider treating null data as an empty string here (or throwing a more appropriate DOMException like INVALID_STATE_ERR), consistent with getLength().

Suggested change
throw new DOMException(DOMException.DOMSTRING_SIZE_ERR, null);
data = "";

Copilot uses AI. Check for mistakes.
if (LOG.isDebugEnabled()) {
LOG.debug(getFormattedBatchId() +
"Retrying committing batch, attempts: " +
commitRetry + "/" + MAX_RETRIES);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry log message prints commitRetry + "/" + MAX_RETRIES, but the loop condition is commitRetry < commitRetryLimit. If commitRetryLimit is configurable and diverges from MAX_RETRIES, this debug message becomes misleading; consider logging against commitRetryLimit here to match the actual retry limit being enforced.

Suggested change
commitRetry + "/" + MAX_RETRIES);
commitRetry + "/" + commitRetryLimit);

Copilot uses AI. Check for mistakes.
Comment thread pom.xml
Comment on lines 6 to +20
@@ -17,7 +17,7 @@
</licenses>

<properties>
<versionString>12.0</versionString>
<versionString>12.1</versionString>
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/title focuses on adding a logging guard for retry messages, but this PR also bumps the project version (and includes many other changes across the codebase). If these additional changes are intentional, please reflect them in the PR description or consider splitting into separate PRs to keep scope focused and simplify review/release notes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants