feat: Optional Jackson serialization for 5x performance improvement on large batches#48
Conversation
Implements high-performance JSON serialization using Jackson's streaming API while maintaining complete backward compatibility with the existing org.json public API. Key improvements: - Automatic detection and use of Jackson when available on classpath - Up to 5x performance improvement for large batch imports (50+ messages) - Zero breaking changes - all public APIs remain unchanged - Graceful fallback to org.json when Jackson is not available Performance benchmarks show: - Small batches (1-10 messages): 1.2-1.5x faster - Medium batches (50-100 messages): ~5x faster - Large batches (500-2000 messages): ~5x faster consistently Implementation details: - Created internal JsonSerializer interface for pluggable implementations - JacksonSerializer uses streaming API to avoid conversion overhead - SerializerFactory automatically selects best available implementation - Modified dataString() method to use the new serialization layer This is particularly beneficial for the /import endpoint which handles up to 2000 messages per batch (40x larger than regular /track endpoint). Users simply add jackson-databind dependency to enable this optimization. No code changes required - the library automatically detects and uses it.
There was a problem hiding this comment.
Pull request overview
This PR introduces optional high-performance JSON serialization using Jackson's streaming API to address performance bottlenecks when serializing large event batches (50-2000 messages) for the /import endpoint. The implementation maintains 100% backward compatibility by keeping org.json for the public API while transparently using Jackson for internal serialization when available.
Key changes:
- New internal serializer abstraction with factory pattern for runtime detection of Jackson
- Jackson-based streaming serialization provides 5x speedup for large batches without API changes
- Automatic fallback to org.json when Jackson is unavailable or if serialization fails
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
JsonSerializer.java |
New interface defining serialization contract for array serialization |
SerializerFactory.java |
Factory with runtime Jackson detection and singleton instance management |
OrgJsonSerializer.java |
Default implementation using existing org.json library |
JacksonSerializer.java |
High-performance implementation using Jackson streaming API |
MixpanelAPI.java |
Updated dataString() to use new serializer abstraction with fallback |
pom.xml |
Added Jackson dependency with provided scope for optional usage |
README.md |
Documentation on enabling high-performance serialization |
JsonSerializerTest.java |
Comprehensive unit tests for both serializer implementations |
SerializerBenchmark.java |
Performance benchmark tool comparing implementations |
.gitignore |
Added .vscode/ directory exclusion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/mixpanel/mixpanelapi/internal/JacksonSerializer.java
Outdated
Show resolved
Hide resolved
src/test/java/com/mixpanel/mixpanelapi/internal/SerializerBenchmark.java
Outdated
Show resolved
Hide resolved
src/test/java/com/mixpanel/mixpanelapi/internal/SerializerBenchmark.java
Outdated
Show resolved
Hide resolved
|
@jaredmixpanel I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@jaredmixpanel I've opened a new pull request, #50, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Initial plan * Add logging for Jackson serialization fallback Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>
* Initial plan * Use StandardCharsets.UTF_8 instead of "UTF-8" string literal Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>
…hmark.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updates Jackson dependency from 2.15.3 to 2.20.0 for the latest performance improvements and security patches.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR introduces optional high-performance JSON serialization using Jackson while maintaining complete backward compatibility. This addresses the performance challenges you identified when importing large batches of events through the
/importendpoint.Key Achievement: We get Jackson's performance benefits WITHOUT breaking the public API or requiring any code changes from users.
Problem Statement
As you discovered in your benchmarking:
/importendpoint handles up to 2000 messages per batch (40x larger than regular endpoints)Solution Approach
Instead of trying to replace org.json or create a bridge between incompatible types, this implementation:
Architecture
Performance Results (Jackson 2.20.0)
Benchmarked with 1000 iterations per test:
The performance improvement is most significant for batches of 50+ messages, which is exactly where the
/importendpoint operates.Implementation Details
Key Components
JsonSerializer Interface (
internal/JsonSerializer.java)OrgJsonSerializer (
internal/OrgJsonSerializer.java)JacksonSerializer (
internal/JacksonSerializer.java)SerializerFactory (
internal/SerializerFactory.java)Changes to Existing Code
providedscopeHow to Enable
Users simply add Jackson to their dependencies:
The library automatically detects and uses it - no code changes required!
Testing
Why This Approach Works
Backward Compatibility
This change is 100% backward compatible:
Recommendations
/importendpoint usage: Add Jackson for 5-6x performance boostNext Steps
After this PR is merged, users importing large batches of historical data will see dramatic performance improvements simply by adding the Jackson dependency. This solves the performance bottleneck while maintaining the stability and compatibility of the existing API.