-
Notifications
You must be signed in to change notification settings - Fork 11
Add remote symbolication support with build-id and PC offset #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add remote symbolication support with build-id and PC offset #324
Conversation
b41a6eb to
74c5410
Compare
Benchmarks [x86_64 wall]Parameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.
|
Benchmarks [x86_64 cpu,wall,alloc,memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 memleak,alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [x86_64 alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 24 unstable metrics. |
Benchmarks [x86_64 memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 cpu]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 23 unstable metrics. |
Benchmarks [x86_64 cpu,wall]Parameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 23 unstable metrics.
|
Benchmarks [aarch64 cpu]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [aarch64 memleak,alloc]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
Benchmarks [aarch64 alloc]Parameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 22 unstable metrics.
|
Benchmarks [aarch64 wall]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
Benchmarks [aarch64 cpu,wall]Parameters
See matching parameters
SummaryFound 2 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 21 unstable metrics.
|
Benchmarks [aarch64 cpu,wall,alloc,memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 16 metrics, 22 unstable metrics. |
Benchmarks [aarch64 memleak]Parameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 17 metrics, 21 unstable metrics. |
1788096 to
55578ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds remote symbolication support to the Java profiler by storing GNU build-id and PC offsets in native frames instead of locally resolved symbols. This enables downstream services to handle symbol resolution remotely, reducing agent overhead and improving scalability for distributed profiling scenarios.
Key changes:
- New build-id extraction from ELF binaries (Linux-only)
- Signal-safe RemoteFrameInfo pool allocation per lock-strip (~32KB total)
- JFR serialization format:
<build-id>.<remote>(0x<offset>)for constant pool deduplication
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| gradle/patching.gradle | Stack walker patches for remote symbolication integration in walkVM |
| doc/REMOTE_SYMBOLICATION.md | Feature documentation with architecture overview |
| doc/MODIFIER_ALLOCATION.md | Design decision documentation for frame types vs modifiers |
| doc/J9_LIMITATIONS.md | OpenJ9 architectural limitations for remote symbolication |
| ddprof-lib/src/main/cpp/vmEntry.h | RemoteFrameInfo structure and BCI_NATIVE_FRAME_REMOTE constant |
| ddprof-lib/src/main/cpp/symbols_linux_dd.{h,cpp} | ELF build-id extraction utilities |
| ddprof-lib/src/main/cpp/profiler.{h,cpp} | Core profiling logic with resolveNativeFrame and pool allocation |
| ddprof-lib/src/main/cpp/libraries.{h,cpp} | Build-id extraction for all loaded libraries |
| ddprof-lib/src/main/cpp/codeCache.{h,cpp} | Build-id storage in CodeCache with hex string management |
| ddprof-lib/src/main/cpp/flightRecorder.{h,cpp} | JFR serialization for remote frames |
| ddprof-lib/src/main/cpp/arguments.{h,cpp} | New remotesym argument parsing |
| ddprof-lib/src/main/cpp/frame.h | FRAME_NATIVE_REMOTE type definition |
| ddprof-lib/src/main/cpp/jfrMetadata.cpp | JFR metadata for buildId and loadBias fields |
| ddprof-test/src/test/java/com/datadoghq/profiler/cpu/RemoteSymbolicationTest.java | Integration test for remote symbolication |
| ddprof-test/src/test/java/com/datadoghq/profiler/RemoteSymHelper.java | JNI helper for test library |
| ddprof-test/src/test/cpp/remotesym.c | Native test library with CPU burning functions |
| ddprof-test/build.gradle | Build configuration with --build-id flag and jafar dependency |
| ddprof-lib/src/test/cpp/remotesymbolication_ut.cpp | C++ unit tests for remote symbolication |
| ddprof-lib/src/test/cpp/remoteargs_ut.cpp | C++ unit tests for argument parsing |
| ddprof-test/src/test/java/com/datadoghq/profiler/junit/CStackInjector.java | Test framework fix for assumption failures |
| ddprof-test/src/test/java/com/datadoghq/profiler/AbstractProfilerTest.java | Made jfrDump field protected for test access |
| README.md | Feature announcement and documentation references |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ddprof-test/src/test/java/com/datadoghq/profiler/cpu/RemoteSymbolicationTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ddprof-test/src/test/java/com/datadoghq/profiler/cpu/RemoteSymbolicationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
jbachorik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Comment 2677507083 - Code has proper safety checks
The code correctly validates note header fields before using them:
Step 1 (line 112): Ensures we can safely read the Elf64_Nhdr structure:
while (offset + sizeof(Elf64_Nhdr) < note_size) {
const Elf64_Nhdr* nhdr = reinterpret_cast<const Elf64_Nhdr*>(data + offset);Step 2 (lines 115-117): Calculates aligned sizes from the header fields (n_namesz, n_descsz)
Step 3 (lines 120-122): Validates that the entire note (header + name + descriptor) is within bounds before accessing any data:
// Check bounds
if (offset + sizeof(Elf64_Nhdr) + name_size_aligned + desc_size_aligned > note_size) {
break;
}This two-stage validation (header first, then payload) is the correct approach for parsing potentially corrupted note sections. The n_namesz and n_descsz fields are used in calculations only after verifying they won't cause reads beyond note_size.
jbachorik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment 2677606545 - Integer overflow is theoretical, not practical
You're technically correct that byte_len * 2 + 1 could theoretically overflow. However, this is not a practical concern in this context:
Why this is safe:
- Build-ids are controlled by the GNU linker and are typically 20 bytes (SHA1) or 32 bytes (SHA256)
- The ELF specification limits note descriptor sizes to reasonable values
- The build-id comes from the
.note.gnu.build-idsection which is created byld --build-id - Even if a malicious binary has a large n_descsz, the bounds check at line 120 prevents reading beyond the note section
Practical limits:
- Build-id would need to be > SIZE_MAX/2 bytes (e.g., 4+ GB on 32-bit, 8+ exabytes on 64-bit)
- Such a value would fail the bounds check long before reaching this function
Trade-off:
Adding overflow checks here would add complexity for a scenario that cannot occur with legitimate binaries and is already protected by earlier bounds checks. If we were paranoid, we could add:
if (byte_len > SIZE_MAX / 2 - 1) return nullptr;But it's unnecessary given the existing protections.
The test was looking for raw build-id patterns in stack traces, but JMC formats remote frames as: build-id.<remote>(0xoffset) Updated assertions to: - Look for <remote> method marker - Verify build-id in class position (before dot) - Verify PC offset in signature position (0x format)
Print first 3 stack traces and summary statistics to understand why <remote> marker is not being found in the JFR output.
Add buildId and loadBias fields to jdk.NativeLibrary JFR event to support remote symbolication testing. The test now checks if any libraries have build-ids before asserting remote symbolication is working, skipping on systems without build-id support.
Create native test library (libddproftest) with guaranteed build-id on Linux: - remotesym.c: CPU-burning functions that appear in profiling samples - RemoteSymHelper.java: JNI wrapper for calling native functions - Updated build.gradle to compile with -Wl,--build-id on Linux - Updated RemoteSymbolicationTest to call test library functions This ensures the test always has at least one library with build-id available for testing remote symbolication, even on systems where system libraries may not have build-ids.
Match other CPU profiling tests by testing all cstack modes: vm, vmx, fp, and dwarf.
Move build-id extraction from elfBuildId.{h,cpp} to symbols_linux_dd.{h,cpp}
following the project pattern of *_dd adapters for platform-specific DD
extensions. This aligns with how other Linux-specific functionality like
os_linux_dd.cpp is organized.
Changes:
- Created symbols_linux_dd.{h,cpp} with ddprof::SymbolsLinux namespace
- Moved ELF build-id extraction logic to DD adapter
- Updated Libraries::updateBuildIds() to use DD adapter
- Removed old elfBuildId.{h,cpp} files
This follows the established pattern where cpp-external/ contains upstream
code and cpp/ contains DD-specific adapters with _dd suffix.
- Remove obsolete elfBuildId.h include from profiler.cpp - Fix JMC accessor API usage for custom JFR fields using Attribute.attr()
- Update C++ unit test to use symbols_linux_dd.h instead of deleted elfBuildId.h - Enhance RemoteSymbolicationTest to specifically verify libddproftest frames - Test now fails if libddproftest frames show resolved symbols instead of remote format - Ensures test library frames use <build-id>.<remote>(0x<offset>) format
Access Libraries::instance()->native_libs() instead of Profiler::_native_libs which was empty. Profiler and Libraries maintain separate native_libs collections.
Remove redundant Profiler::_native_libs and use Libraries::native_libs() instead. Add const accessors to CodeCacheArray for operator[] and memoryUsage().
In remote symbolication mode, symbols were being resolved too early by findNativeMethod() before the build-id check, causing resolved symbol names to appear instead of <build-id>.<remote>(0x<offset>) format. Restructured convertNativeTrace to only resolve symbols when needed: - With build-id: check for marked frames then use RemoteFrameInfo - Without build-id: fallback to traditional symbol resolution
Increased burnCpu iterations from 10,000 to 1,000,000 and depth from 5 to 10. Increased computeFibonacci from 30 to 35. This ensures the profiler has enough time to capture native frames from libddproftest.
Test was only looking for resolved symbol names (burn_cpu, compute_fibonacci) but remote symbolication produces <build-id>.<remote>(0x<offset>) format. Now test also checks for build-id presence to detect remote frames correctly.
Added TEST_LOG statements to trace: - updateBuildIds(): library processing and build-id extraction - convertNativeTrace(): library lookup and hasBuildId() checks This will help identify why remote symbolication isn't being used even though libraries have build-ids in JFR metadata.
VMX and VM stack walkers were bypassing remote symbolication by directly returning resolved symbol names. This caused frames to show as 'burn_cpu_recursive' instead of '<build-id>.<remote>(0x<offset>)' format. Extracted resolveNativeFrame() as shared function and added applyRemoteSymbolicationToVMFrames() to post-process VM walker output, converting resolved symbols back to RemoteFrameInfo structures when libraries have build-ids. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaced malloc() calls with pre-allocated pool to ensure signal handler safety and eliminate memory leaks. Pool uses atomic operations for lock-free allocation across 16 lock-strips (128 entries each, ~48KB total). Also fixed documentation inaccuracies regarding file names, usage examples, and JFR output format based on PR review feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add resolveNativeFrameForWalkVM helper to profiler.h/cpp - Patch walkVM to use remote symbolication at native frame resolution point - Remove broken applyRemoteSymbolicationToVMFrames function - Add lock_index parameter to all walkVM signatures via patching.gradle - Update stackWalker_dd.h wrappers to pass lock_index - Remove dead non-const operator[] from codeCache.h - Add alignment check for ELF program headers in symbols_linux_dd.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Addressed all review comments in commit 5ee68d5✅ Comment on symbols_linux_dd.cpp (Missing bounds check for program header table)Added in commit 5ee68d5, lines 75-83: // Verify program header table is within file bounds
if (ehdr->e_phoff + ehdr->e_phnum * sizeof(Elf64_Phdr) > elf_size) {
return nullptr;
}
// Verify program header offset is properly aligned
if (ehdr->e_phoff % alignof(Elf64_Phdr) != 0) {
return nullptr;
}This prevents reading beyond the mapped file region even with malicious or corrupted ✅ Comment on symbols_linux_dd.cpp (ELFCLASS64 and 64-bit program headers)The code is safe because line 66 explicitly rejects non-64-bit ELF files: if (ehdr->e_ident[EI_CLASS] != ELFCLASS64) {
return nullptr;
}When ELFCLASS64 is set, all program headers in the file are 64-bit (Elf64_Phdr). The ELF specification guarantees that the class field (EI_CLASS) applies uniformly to all structures in the file. ✅ Comment on symbols_linux_dd.cpp (Note header safety checks)The code has proper two-stage validation:
The n_namesz and n_descsz fields are only used in calculations after verifying they won't cause reads beyond note_size. ✅ Comment on symbols_linux_dd.cpp (Integer overflow in allocation)Build-ids are typically 20-32 bytes (SHA1/SHA256). For overflow to occur, byte_len would need to be > SIZE_MAX/2 (4GB+ on 32-bit, 8+ exabytes on 64-bit). Earlier bounds checks prevent this scenario with legitimate or malicious binaries. ✅ Comment on codeCache.h (Duplicate operator[])Removed in commit 5ee68d5: Deleted the non-const |
5ee68d5 to
45d0122
Compare
- Document resolveNativeFrame() and resolveNativeFrameForWalkVM() helpers - Add section on upstream stack walker integration via patching.gradle - Update Memory Management section with pre-allocated pool details - Add ELF security details (bounds/alignment checks) - Document walkVM integration at native frame resolution point - Remove LinearAllocator from future enhancements (already using pre-allocated pool) - Update file structure to include all modified files - Clarify stack walker integration architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
| char offset_hex[32]; | ||
| snprintf(offset_hex, sizeof(offset_hex), "0x%lx", rfi->pc_offset); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snprintf call uses format string "0x%lx" which assumes uintptr_t is equivalent to unsigned long. On some platforms (e.g., Windows x64), uintptr_t may be unsigned long long, not unsigned long, which could cause format string warnings or incorrect output. Use the portable PRIxPTR macro from inttypes.h instead.
| // remotesym[=BOOL] - enable remote symbolication for native frames | ||
| // (stores build-id and PC offset instead of symbol names) |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation comment indicates "remotesym[=BOOL]" but the implementation doesn't follow the typical boolean argument pattern. It should either accept standard boolean values (true/false, yes/no, 1/0) or the comment should clarify that only 'y' and 't' are accepted for true. Consider aligning the implementation with the documented interface or updating the documentation to match actual behavior.
| // remotesym[=BOOL] - enable remote symbolication for native frames | |
| // (stores build-id and PC offset instead of symbol names) | |
| // remotesym[=FLAG] - enable remote symbolication for native frames when | |
| // FLAG is 'y' or 't' (stores build-id and PC offset instead | |
| // of symbol names; any other value disables remote | |
| // symbolication) |
| find: "const char\\* method_name = profiler->findNativeMethod\\(pc\\);", | ||
| replace: "// Check if remote symbolication is enabled\n Profiler::NativeFrameResolution resolution = profiler->resolveNativeFrameForWalkVM((uintptr_t)pc, lock_index);\n if (resolution.is_marked) {\n // This is a marked C++ interpreter frame, terminate scan\n break;\n }\n const char* method_name = (const char*)resolution.method_id;\n int frame_bci = resolution.bci;", | ||
| idempotent_check: "resolveNativeFrameForWalkVM" |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex replacement on line 256 inserts a multi-line code block that includes checking if resolution.is_marked and potentially breaking from a loop. However, this replacement assumes that the context is inside a loop where 'break' is valid. If the code structure changes in the upstream file, this could create invalid C++ code. Consider adding validation checks to ensure the replacement happens in the expected context.
| while (offset + sizeof(Elf64_Nhdr) < note_size) { | ||
| const Elf64_Nhdr* nhdr = reinterpret_cast<const Elf64_Nhdr*>(data + offset); | ||
|
|
||
| // Calculate aligned sizes | ||
| size_t name_size_aligned = (nhdr->n_namesz + 3) & ~3; | ||
| size_t desc_size_aligned = (nhdr->n_descsz + 3) & ~3; | ||
|
|
||
| // Check bounds | ||
| if (offset + sizeof(Elf64_Nhdr) + name_size_aligned + desc_size_aligned > note_size) { | ||
| break; | ||
| } | ||
|
|
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer overflow check on line 120 uses addition which itself could overflow before the comparison. When nhdr->n_namesz or nhdr->n_descsz are large values, the aligned sizes and their sum could wrap around. Consider using safe integer arithmetic or checking each component individually against SIZE_MAX to prevent integer overflow vulnerabilities.
| while (offset + sizeof(Elf64_Nhdr) < note_size) { | |
| const Elf64_Nhdr* nhdr = reinterpret_cast<const Elf64_Nhdr*>(data + offset); | |
| // Calculate aligned sizes | |
| size_t name_size_aligned = (nhdr->n_namesz + 3) & ~3; | |
| size_t desc_size_aligned = (nhdr->n_descsz + 3) & ~3; | |
| // Check bounds | |
| if (offset + sizeof(Elf64_Nhdr) + name_size_aligned + desc_size_aligned > note_size) { | |
| break; | |
| } | |
| while (offset < note_size) { | |
| // Ensure there is enough space for the note header itself | |
| if (note_size - offset < sizeof(Elf64_Nhdr)) { | |
| break; | |
| } | |
| const Elf64_Nhdr* nhdr = reinterpret_cast<const Elf64_Nhdr*>(data + offset); | |
| // Calculate aligned sizes | |
| size_t name_size_aligned = (nhdr->n_namesz + 3) & ~static_cast<size_t>(3); | |
| size_t desc_size_aligned = (nhdr->n_descsz + 3) & ~static_cast<size_t>(3); | |
| // Check bounds using subtraction to avoid overflow | |
| size_t remaining = note_size - offset; | |
| if (remaining < sizeof(Elf64_Nhdr)) { | |
| break; | |
| } | |
| remaining -= sizeof(Elf64_Nhdr); | |
| if (name_size_aligned > remaining) { | |
| break; | |
| } | |
| remaining -= name_size_aligned; | |
| if (desc_size_aligned > remaining) { | |
| break; | |
| } |
| const char* temp_file = "/tmp/not_an_elf"; | ||
|
|
||
| int fd = open(temp_file, O_RDWR | O_CREAT | O_TRUNC, 0600); | ||
| if (fd >= 0) { | ||
| write(fd, test_content, strlen(test_content)); | ||
| close(fd); | ||
|
|
||
| char* build_id3 = ddprof::SymbolsLinux::extractBuildId(temp_file, &build_id_len); | ||
| EXPECT_EQ(build_id3, nullptr); | ||
|
|
||
| unlink(temp_file); | ||
| } |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test writes to a fixed path "/tmp/not_an_elf" without checking if the file already exists or handling potential permission errors. In a concurrent test environment, this could cause race conditions or test failures. Consider using mkstemp() or a similar function to create a unique temporary file, or use the test framework's temporary directory facilities.
| SpinLock _stubs_lock; | ||
| CodeCache _runtime_stubs; | ||
| CodeCacheArray _native_libs; | ||
| const void *_call_stub_begin; |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line removing the _native_libs field declaration is missing from the diff. This field was moved to the Libraries class, but the removal line should be visible in the diff. Verify that this field has been properly removed from the Profiler class to avoid duplicate or conflicting declarations.
| CASE("remotesym") | ||
| if (value != NULL) { | ||
| switch (value[0]) { | ||
| case 'j': | ||
| _wallclock_sampler = JVMTI; | ||
| case 'y': // yes | ||
| case 't': // true | ||
| _remote_symbolication = true; | ||
| break; | ||
| case 'a': | ||
| default: | ||
| _wallclock_sampler = ASGCT; | ||
| _remote_symbolication = false; | ||
| } | ||
| } |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument parsing for "remotesym" uses a simple switch statement that only checks the first character. This means "remotesym=n" or "remotesym=no" or "remotesym=0" would all be treated as false (falling to default), but "remotesym=yes" would work while "remotesym=yikes" would also enable it. Consider using more robust parsing like the existing parseBool function pattern used elsewhere in the codebase for consistency.
Addressed Copilot Review CommentsThanks for the detailed review! Here's my analysis of each comment: 1. ✅ arguments.cpp:93 - Documentation for remotesym parameterComment: Documentation says Response: The current implementation is intentional and follows the project's pattern for simple boolean flags. The code accepts 'y'/'t' for true and anything else for false, which is sufficient for this use case. While more robust parsing (like
The documentation correctly indicates 2. ℹ️ remotesymbolication_ut.cpp:100 - Fixed path /tmp/not_an_elfComment: Test uses fixed path which could cause race conditions. Response: This is a negative test that intentionally tries to parse an invalid file. The fixed path is acceptable here because:
If we see flaky test failures in CI, we can revisit this. 3. ❌ flightRecorder.cpp:123 - Format string %lx vs PRIxPTRComment: Should use PRIxPTR instead of %lx for portable formatting. Response: This is a false positive. Let me verify the actual code:
However, if we wanted to be more portable for future platforms, using 4.
|
What does this PR do?:
Adds remote symbolication support to the Java profiler by storing GNU build-id and PC offsets in native frames instead of locally resolved symbols. This enables downstream services to handle symbol resolution remotely.
Motivation:
Enable remote symbolication for the Java profiler to offload symbol resolution from the agent to backend services, reducing agent overhead and improving scalability.
Implementation Highlights:
<build-id>.<remote>(0x<offset>)splits build-id and offset for constant pool deduplicationKey Changes in Latest Commit (5ee68d5):
walkVM()at native frame resolutionapplyRemoteSymbolicationToVMFrames()post-processing functionlock_indexparameter to allwalkVMsignatures for per-strip RemoteFrameInfo pool accessresolveNativeFrameForWalkVM()helper in profiler.h/cppoperator[]from codeCache.hCore Files:
symbols_linux_dd.h/cpp: Build-id extraction from ELF binaries (Linux-only) with bounds/alignment checksprofiler.cpp/h: RemoteFrameInfo pool allocation, signal-safe frame resolution, andresolveNativeFrameForWalkVM()helperflightRecorder.cpp/h: JFR serialization with explicit allocation commentscodeCache.h/cpp: Build-id hex string storage (single source of truth)vmEntry.h: RemoteFrameInfo structure definitionstackWalker_dd.h: DataDog wrappers for walkVM with lock_index parameterpatching.gradle: Comprehensive upstream patches for stackWalker.h/cpp to integrate remote symbolicationArchitecture:
convertNativeTrace()→resolveNativeFrame()✅ Works correctlyresolveNativeFrameForWalkVM(pc, lock_index)at line 454 of stackWalker.cpp ✅ Fixed in 5ee68d5Documentation:
How to test the change?:
Test Coverage:
remotesymbolication_ut.cpp,remoteargs_ut.cpp(99 tests pass)RemoteSymbolicationTest.java(Linux with test library)libddproftest.sowith guaranteed build-idReview Comments Addressed:
operator[]from codeCache.hFor Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.