Skip to content

feat: Optional Jackson serialization for 5x performance improvement on large batches#48

Merged
jaredmixpanel merged 6 commits intomasterfrom
feature/jackson-performance-optimization
Nov 24, 2025
Merged

feat: Optional Jackson serialization for 5x performance improvement on large batches#48
jaredmixpanel merged 6 commits intomasterfrom
feature/jackson-performance-optimization

Conversation

@jaredmixpanel
Copy link
Copy Markdown
Contributor

@jaredmixpanel jaredmixpanel commented Nov 21, 2025

Summary

This PR introduces optional high-performance JSON serialization using Jackson while maintaining complete backward compatibility. This addresses the performance challenges you identified when importing large batches of events through the /import endpoint.

Key Achievement: We get Jackson's performance benefits WITHOUT breaking the public API or requiring any code changes from users.

Problem Statement

As you discovered in your benchmarking:

  • org.json becomes a bottleneck when serializing large batches (50+ messages)
  • The /import endpoint handles up to 2000 messages per batch (40x larger than regular endpoints)
  • Direct replacement with Jackson would cause breaking changes due to incompatible types (JSONObject vs ObjectNode)
  • Bridge pattern attempts failed due to conversion overhead negating performance gains

Solution Approach

Instead of trying to replace org.json or create a bridge between incompatible types, this implementation:

  1. Keeps org.json for the public API - All public methods continue to accept/return JSONObject
  2. Uses Jackson only for internal serialization - Jackson's streaming API serializes JSONObjects directly
  3. Automatic runtime detection - The library detects if Jackson is available and uses it transparently
  4. Zero conversion overhead - We stream JSONObject data directly through Jackson's JsonGenerator

Architecture

Public API (unchanged)
    ↓
JSONObject messages (org.json)
    ↓
JsonSerializer Interface (new)
    ├── OrgJsonSerializer (default)
    └── JacksonSerializer (when available)
    ↓
Serialized JSON string/bytes

Performance Results (Jackson 2.20.0)

Benchmarked with 1000 iterations per test:

Batch Size org.json Jackson Speedup
1 message 12ms 11ms 1.09x
10 messages 96ms 52ms 1.85x
50 messages 337ms 51ms 6.61x
100 messages 600ms 94ms 6.38x
500 messages 2,798ms 624ms 4.48x
1000 messages 5,690ms 1,116ms 5.10x
2000 messages 11,688ms 2,224ms 5.26x

The performance improvement is most significant for batches of 50+ messages, which is exactly where the /import endpoint operates.

Implementation Details

Key Components

  1. JsonSerializer Interface (internal/JsonSerializer.java)

    • Defines contract for JSON serialization
    • Methods for both String and byte[] output
  2. OrgJsonSerializer (internal/OrgJsonSerializer.java)

    • Default implementation using org.json
    • Maintains existing behavior
  3. JacksonSerializer (internal/JacksonSerializer.java)

    • High-performance implementation using Jackson's streaming API
    • Recursively streams JSONObject/JSONArray without conversion
    • Handles all JSON types (objects, arrays, primitives, nulls)
  4. SerializerFactory (internal/SerializerFactory.java)

    • Runtime detection of Jackson availability
    • Singleton pattern for efficiency
    • Logging to inform users of active implementation

Changes to Existing Code

  • MixpanelAPI.dataString(): Now uses JsonSerializer instead of direct JSONArray.toString()
  • pom.xml: Added Jackson as optional dependency with provided scope
  • README.md: Added documentation for enabling high-performance mode

How to Enable

Users simply add Jackson to their dependencies:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.20.0</version>
</dependency>

The library automatically detects and uses it - no code changes required!

Testing

  • ✅ All 81 existing tests pass
  • ✅ Added comprehensive tests for both serializer implementations
  • ✅ Created performance benchmark suite
  • ✅ Tested with and without Jackson on classpath
  • ✅ Verified correct fallback behavior

Why This Approach Works

  1. No conversion overhead: Unlike the bridge pattern attempt, we don't convert between JSONObject and ObjectNode
  2. Streaming efficiency: Jackson's JsonGenerator writes directly to output without intermediate objects
  3. Selective optimization: We optimize only the serialization bottleneck, not the entire message pipeline
  4. Runtime flexibility: Users choose whether to include Jackson based on their performance needs

Backward Compatibility

This change is 100% backward compatible:

  • All public APIs unchanged
  • No breaking changes to method signatures
  • Existing code continues to work without modifications
  • Users without Jackson see no change in behavior

Recommendations

  • For regular event tracking (< 50 events): org.json is sufficient
  • For /import endpoint usage: Add Jackson for 5-6x performance boost
  • For high-volume applications: Jackson is highly recommended

Next Steps

After this PR is merged, users importing large batches of historical data will see dramatic performance improvements simply by adding the Jackson dependency. This solves the performance bottleneck while maintaining the stability and compatibility of the existing API.

Implements high-performance JSON serialization using Jackson's streaming API
while maintaining complete backward compatibility with the existing org.json
public API.

Key improvements:
- Automatic detection and use of Jackson when available on classpath
- Up to 5x performance improvement for large batch imports (50+ messages)
- Zero breaking changes - all public APIs remain unchanged
- Graceful fallback to org.json when Jackson is not available

Performance benchmarks show:
- Small batches (1-10 messages): 1.2-1.5x faster
- Medium batches (50-100 messages): ~5x faster
- Large batches (500-2000 messages): ~5x faster consistently

Implementation details:
- Created internal JsonSerializer interface for pluggable implementations
- JacksonSerializer uses streaming API to avoid conversion overhead
- SerializerFactory automatically selects best available implementation
- Modified dataString() method to use the new serialization layer

This is particularly beneficial for the /import endpoint which handles
up to 2000 messages per batch (40x larger than regular /track endpoint).

Users simply add jackson-databind dependency to enable this optimization.
No code changes required - the library automatically detects and uses it.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces optional high-performance JSON serialization using Jackson's streaming API to address performance bottlenecks when serializing large event batches (50-2000 messages) for the /import endpoint. The implementation maintains 100% backward compatibility by keeping org.json for the public API while transparently using Jackson for internal serialization when available.

Key changes:

  • New internal serializer abstraction with factory pattern for runtime detection of Jackson
  • Jackson-based streaming serialization provides 5x speedup for large batches without API changes
  • Automatic fallback to org.json when Jackson is unavailable or if serialization fails

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
JsonSerializer.java New interface defining serialization contract for array serialization
SerializerFactory.java Factory with runtime Jackson detection and singleton instance management
OrgJsonSerializer.java Default implementation using existing org.json library
JacksonSerializer.java High-performance implementation using Jackson streaming API
MixpanelAPI.java Updated dataString() to use new serializer abstraction with fallback
pom.xml Added Jackson dependency with provided scope for optional usage
README.md Documentation on enabling high-performance serialization
JsonSerializerTest.java Comprehensive unit tests for both serializer implementations
SerializerBenchmark.java Performance benchmark tool comparing implementations
.gitignore Added .vscode/ directory exclusion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 21, 2025

@jaredmixpanel I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 21, 2025

@jaredmixpanel I've opened a new pull request, #50, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits November 21, 2025 13:41
* Initial plan

* Add logging for Jackson serialization fallback

Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>
* Initial plan

* Use StandardCharsets.UTF_8 instead of "UTF-8" string literal

Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jaredmixpanel <10504508+jaredmixpanel@users.noreply.github.com>
…hmark.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updates Jackson dependency from 2.15.3 to 2.20.0 for the latest
performance improvements and security patches.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jaredmixpanel jaredmixpanel added enhancement dependencies Pull requests that update a dependency file labels Nov 21, 2025
@jaredmixpanel jaredmixpanel self-assigned this Nov 21, 2025
@jaredmixpanel jaredmixpanel merged commit ea35181 into master Nov 24, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants