Skip to content

Pure Ruby implementation of the msgpack binary serialization format (https://msgpack.org)

Notifications You must be signed in to change notification settings

lutaml/messagepack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MessagePack

RubyGems Version License Build

Purpose

MessagePack is a pure Ruby implementation of the MessagePack binary serialization format.

MessagePack is an efficient binary serialization format that enables exchange of data among multiple languages like JSON, but is faster and smaller.

This implementation provides:

  • Pure Ruby implementation (no C extension required)

  • Full compatibility with the MessagePack specification

  • Support for custom extension types

  • Thread-safe factory pattern for packer/unpacker reuse

  • Streaming unpacker for incremental parsing

  • Comprehensive timestamp support with nanosecond precision

Features

Architecture

MessagePack serialization architecture
  ┌───────────────────────────────────────────────────────────┐
  │                       User Application                    │
  └──────────────────────────┬────────────────────────────────┘
                             │
                ┌────────────┴────────────┐
                │                         │
        ┌───────────────┐        ┌──────────────────┐
        │ MessagePack   │        │  Factory Pattern │
        │ .pack/unpack  │        │  (thread-safe)   │
        └───────┬───────┘        └────────┬─────────┘
                │                         │
        ┌───────┴──────┐          ┌───────┴──────────┐
        │              │          │                  │
    ┌────────┐  ┌──────────┐   ┌──────────┐  ┌─────────────┐
    │ Packer │  │ Unpacker │   │ Packer   │  │ Unpacker    │
    │        │  │          │   │ Pool     │  │ Pool        │
    └──┬─────┘  └─────┬────┘   └────┬─────┘  └────┬────────┘
       │              │             │             │
       └─────┬────────┘             └──────┬──────┘
             │                             │
       ┌─────┴─────────────────────────────┴────────┐
       │    ┌──────────────────────────────────┐    │
       │    │   BinaryBuffer (chunked)         │    │
       │    │                                  │    │
       │    │  ┌────┬────┬────┬────┬─ ─ ─┐     │    │
       │    │  │  1 │  2 │  3 │  4 │  N  │     │    │
       │    │  └────┴────┴────┴────┴─ ─ ─┘     │    │
       │    └──────────────────────────────────┘    │
       │    ┌──────────────────────────────────┐    │
       │    │ Extension Registry               │    │
       │    │                                  │    │
       │    │  Timestamp (-1)                  │    │
       │    │  Symbol (0)                      │    │
       │    │  Custom Types (1-127, -2 to -128)│    │
       │    └──────────────────────────────────┘    │
       └────────────────────────────────────────────┘
MessagePack format encoding
┌──────────────────────────────────────────────────────────────────┐
│                     MessagePack Binary Format                    │
└──────────────────────────────────────────────────────────────────┘

  Positive Fixnum ────────┐
                          │
  Negative Fixnum ────────┼── 0x00-0x7F and 0xE0-0xFF
                          │     1 byte format, value embedded
  Nil ────────────────────┤
                          │
  Boolean ────────────────┘

  UInt 8 ───────────────── 0xCC (1 byte format + 1 byte data)
  UInt 16 ──────────────── 0xCD (1 byte format + 2 byte data)
  UInt 32 ──────────────── 0xCE (1 byte format + 4 byte data)
  UInt 64 ──────────────── 0xCF (1 byte format + 8 byte data)

  Int 8 ────────────────── 0xD0 (1 byte format + 1 byte data)
  Int 16 ───────────────── 0xD1 (1 byte format + 2 byte data)
  Int 32 ───────────────── 0xD2 (1 byte format + 4 byte data)
  Int 64 ───────────────── 0xD3 (1 byte format + 8 byte data)

  Float 32 ─────────────── 0xCA (1 byte format + 4 byte data)
  Float 64 ─────────────── 0xCB (1 byte format + 8 byte data)

  FixStr ───────────────── 0xA0-0xBF (1 byte format + 0-31 bytes)
  Str 8 ────────────────── 0xD9 (1 byte format + 1 byte length)
  Str 16 ───────────────── 0xDA (1 byte format + 2 byte length)
  Str 32 ───────────────── 0xDB (1 byte format + 4 byte length)

  Bin 8 ────────────────── 0xC4 (1 byte format + 1 byte length)
  Bin 16 ───────────────── 0xC5 (1 byte format + 2 byte length)
  Bin 32 ───────────────── 0xC6 (1 byte format + 4 byte length)

  FixArray ─────────────── 0x90-0x9F (1 byte format + 0-15 elements)
  Array 16 ─────────────── 0xDC (1 byte format + 2 byte count)
  Array 32 ─────────────── 0xDD (1 byte format + 4 byte count)

  FixMap ───────────────── 0x80-0x8F (1 byte format + 0-15 entries)
  Map 16 ───────────────── 0xDE (1 byte format + 2 byte count)
  Map 32 ───────────────── 0xDF (1 byte format + 4 byte count)

  FixExt 1 ─────────────── 0xD4 (1 byte format + 1 byte type + 1 byte)
  FixExt 2 ─────────────── 0xD5 (1 byte format + 1 byte type + 2 bytes)
  FixExt 4 ─────────────── 0xD6 (1 byte format + 1 byte type + 4 bytes)
  FixExt 8 ─────────────── 0xD7 (1 byte format + 1 byte type + 8 bytes)
  FixExt 16 ────────────── 0xD8 (1 byte format + 1 byte type + 16 bytes)
  Ext 8 ────────────────── 0xC7 (1 byte format + 1 byte len + 1 byte type)
  Ext 16 ───────────────── 0xC8 (1 byte format + 2 byte len + 1 byte type)
  Ext 32 ───────────────── 0xC9 (1 byte format + 4 byte len + 1 byte type)

Installation

Add this line to your application’s Gemfile:

gem 'messagepack'

And then execute:

bundle install

Or install it yourself as:

gem install messagepack

Core serialization

The core MessagePack API provides simple pack and unpack methods for serializing and deserializing Ruby objects.

Packing objects

Use Messagepack.pack to serialize Ruby objects to binary format:

Messagepack.pack({hello: "world"}) # => "\x81\xA5hello\xA5world"

Where,

  • Messagepack.pack accepts any Ruby object as its argument

  • The return value is a binary string containing the serialized data

  • Supported types include: nil, boolean, integer, float, string, array, hash, and any registered extension types

Unpacking data

Use Messagepack.unpack to deserialize binary data back to Ruby objects:

data = Messagepack.pack({hello: "world"})
Messagepack.unpack(data) # => {"hello"=>"world"}

Where,

  • Messagepack.unpack accepts a binary string or IO object

  • The return value is the original Ruby object

  • Extra bytes after the deserialized object will raise a Messagepack::MalformedFormatError

Example 1. Using pack and unpack
# Serialize a complex object
data = {
  name: "Alice",
  age: 30,
  skills: ["Ruby", "Python"],
  metadata: {
    active: true,
    score: 95.5
  }
}

binary = Messagepack.pack(data)
# => "\x84\xA4name\xA5Alice\xA3age\x1E\xA6skills\
#     \x92\xA4Ruby\xA6Python\xA8metadata\x82\xA6active\
#     \xC3\xA5score\xCB@_\x00\x00"

# Deserialize back to a Ruby object
result = Messagepack.unpack(binary)
# => {"name"=>"Alice", "age"=>30, "skills"=>["Ruby", "Python"],
#     "metadata"=>{"active"=>true, "score"=>95.5}}

Factory pattern

The Messagepack::Factory class provides thread-safe management of packer and unpacker instances with support for custom type registrations.

Creating a factory

factory = Messagepack::Factory.new

Where,

  • Factory.new creates a new factory instance

  • Each factory maintains its own type registry

  • Factories can be frozen for thread-safe use

Registering custom types

factory.register_type(0x01, MyClass,
  packer: :to_msgpack_ext,
  unpacker: :from_msgpack_ext
)

Where,

  • 0x01 is the type identifier (must be -128 to 127)

  • MyClass is the Ruby class to register

  • packer specifies how to serialize instances (symbol, method, or proc)

  • unpacker specifies how to deserialize data (symbol, method, or proc)

Using factory pool for thread safety

pool = factory.pool(5) # Create pool with 5 packers/unpackers
data = pool.pack(my_object)  # Thread-safe packing
obj = pool.unpack(binary)    # Thread-safe unpacking

Where,

  • factory.pool(size) creates a thread-safe pool

  • size is the number of packer/unpacker instances in the pool

  • The pool automatically manages instance reuse

  • Each thread gets its own instance from the pool

Example 2. Thread-safe factory usage
# Create a factory with custom types
factory = Messagepack::Factory.new
factory.register_type(0x01, MyCustomClass,
  packer: ->(obj) { obj.serialize },
  unpacker: ->(data) { MyCustomClass.deserialize(data) }
)

# Create a thread-safe pool
pool = factory.pool(10)

# Use from multiple threads safely
threads = 10.times.map do |i|
  Thread.new do
    object = MyCustomClass.new("data-#{i}")
    binary = pool.pack(object)
    result = pool.unpack(binary)
    result.value
  end
end

puts threads.map(&:value).inspect

Extension types

MessagePack supports custom extension types for serializing objects that don’t have a native MessagePack representation.

Extension type format

factory.register_type(type_id, class,
  packer: packer_specification,
  unpacker: unpacker_specification
)

Where,

  • type_id is an integer from -128 to 127

  • class is the Ruby class to serialize

  • packer_specification can be:

  • A symbol (method name to call on the object)

  • A proc (called with the object)

  • A method object

  • unpacker_specification can be:

  • A symbol (class method to call)

  • A proc (called with the payload data)

  • A method object

Recursive extension types

factory.register_type(0x02, MyContainer,
  packer: ->(obj, packer) { packer.write(obj.to_h) },
  unpacker: ->(unpacker) { MyContainer.from_hash(unpacker.read) },
  recursive: true
)

Where,

  • recursive: true enables nested serialization

  • The packer lambda receives the packer instance for recursive calls

  • The unpacker lambda receives the unpacker instance for recursive reads

Example 3. Custom extension type for Money objects
class Money
  attr_reader :amount, :currency

  def initialize(amount, currency)
    @amount = amount
    @currency = currency
  end

  def to_msgpack_ext
    [amount, currency].pack("QA*")
  end

  def self.from_msgpack_ext(data)
    amount, currency = data.unpack("QA*")
    new(amount, currency)
  end
end

factory = Messagepack::Factory.new
factory.register_type(0x10, Money,
  packer: :to_msgpack_ext,
  unpacker: :from_msgpack_ext
)

money = Money.new(1000, "USD")
binary = factory.pack(money)
result = factory.unpack(binary)
# => #<Money:0x... @amount=1000, @currency="USD">

Timestamp extension

The timestamp extension (type -1) provides nanosecond precision time handling for Time objects.

Timestamp formats

Example 4. MessagePack automatically selects the appropriate format
Timestamp32  - 4 bytes (seconds only, 32-bit)
              Used when: nanoseconds == 0 and
                         seconds fit in 32 bits

Timestamp64  - 8 bytes (seconds + nanoseconds)
              Used when: nanoseconds != 0 and
                         timestamp fits in 64 bits

Timestamp96 - 12 bytes (seconds + nanoseconds, 96-bit)
              Used when: timestamp requires 96 bits

Using timestamp with Time

factory.register_type(-1, Time,
  packer: Messagepack::Time::Packer,
  unpacker: Messagepack::Time::Unpacker
)

Where,

  • -1 is the reserved type ID for timestamps

  • Messagepack::Time::Packer handles serialization with nanosecond precision

  • Messagepack::Time::Unpacker handles deserialization

Example 5. Timestamp serialization examples
factory = Messagepack::Factory.new
factory.register_type(-1, Time,
  packer: Messagepack::Time::Packer,
  unpacker: Messagepack::Time::Unpacker
)

# Current time with nanosecond precision
now = Time.now
binary = factory.pack(now)
restored = factory.unpack(binary)
puts restored.tv_nsec # Nanoseconds preserved

# Historical date
time = Time.utc(2020, 1, 1, 12, 30, 45)
binary = factory.pack(time)
puts binary.size # => 6 (fixext4 format)

# Future date with nanoseconds
future = Time.utc(2100, 6, 15, 0, 0, 0, 123456789)
binary = factory.pack(future)
puts binary.size # => 15 (ext8 with timestamp96)

Symbol extension

The symbol extension (type 0) provides efficient serialization of Ruby symbols.

Registering symbol type

factory.register_type(0, Symbol)

Where,

  • 0 is the type ID for symbols

  • The extension uses to_sym and to_s for packing/unpacking

Symbol serialization

factory.register_type(0, Symbol)
binary = factory.pack(:hello_symbol)
result = factory.unpack(binary) # => :hello_symbol

Where,

  • Symbols are serialized as their string representation

  • Deserialization converts the string back to a symbol

  • This is more efficient than serializing as strings

Example 6. Symbol serialization in data structures
factory = Messagepack::Factory.new
factory.register_type(0, Symbol)

data = {
  status: :active,
  priority: :high,
  tags: [:important, :urgent]
}

binary = factory.pack(data)
result = factory.unpack(binary)
# => {:status=>:active, :priority=>:high, :tags=>[:important, :urgent]}

Streaming unpacking

The streaming unpacker allows incremental parsing of MessagePack data as it becomes available.

Feeding data incrementally

unpacker = Messagepack::Unpacker.new
unpacker.feed("\x81")       # Feed partial data
unpacker.feed("\xA3")       # Feed more
unpacker.feed("foo")        # Feed final part
obj = unpacker.read         # => {"foo"=>nil}

Where,

  • Unpacker.new creates a new unpacker instance

  • feed(data) appends data to the buffer

  • read returns one complete object or nil if more data is needed

Streaming from IO

unpacker = Messagepack::Unpacker.new(io)
obj = unpacker.read         # Reads from IO as needed

Where,

  • Unpacker.new(io) creates an unpacker attached to an IO

  • The unpacker automatically reads from the IO when needed

  • Use full_unpack to read a single object and reset

Example 7. Streaming unpacking from network
require 'socket'

# Simulate receiving data in chunks
unpacker = Messagepack::Unpacker.new

chunks = ["\x81\xA3", "foo", "\xA5", "world"]

chunks.each do |chunk|
  unpacker.feed(chunk)
  obj = unpacker.read
  if obj
    puts "Received: #{obj.inspect}"
  else
    puts "Waiting for more data..."
  end
end

# Output:
# Waiting for more data...
# Waiting for more data...
# Waiting for more data...
# Received: {"foo"=>"world"}

Buffer management

The BinaryBuffer class provides efficient chunked storage for binary data.

Buffer operations

buffer = Messagepack::BinaryBuffer.new
buffer << "data"
buffer.read(4)             # => "data"
buffer.to_s                # => ""

Where,

  • BinaryBuffer.new creates a new buffer

  • << appends data to the buffer

  • read(n) reads and consumes n bytes

  • to_s returns remaining data without consuming

Skip operations

buffer = Messagepack::BinaryBuffer.new
buffer << "\x81\xA3foo\xA5world"
buffer.skip              # Skip one object (format byte)
buffer.skip_nil          # Skip nil value if present

Where,

  • skip skips a complete MessagePack object

  • skip_nil efficiently skips nil values

Buffer with IO

File.open("data.msgpack", "rb") do |io|
  buffer = Messagepack::BinaryBuffer.new(io)
  unpacker = Messagepack::Unpacker.new(buffer)
  obj = unpacker.read
end

Where,

  • The buffer reads from the IO when needed

  • Data is automatically managed in chunks

  • Suitable for large files that don’t fit in memory

Example 8. Reading large MessagePack files efficiently
# Process a large file without loading everything into memory
buffer = Messagepack::BinaryBuffer.new(File.open("large.msgpack", "rb"))
unpacker = Messagepack::Unpacker.new(buffer)

while obj = unpacker.read
  # Process each object one at a time
  process(obj)
end

Performance optimizations

This implementation includes several performance optimizations that make the pure Ruby implementation efficient for typical use cases.

Native type fast-path

Native MessagePack types (nil, boolean, integer, float, string, symbol, array, hash) bypass the extension registry lookup for optimal performance:

  • Native types are identified without O(n) registry search

  • Native types with custom extension registrations still use the registry

  • Custom types pay the registry lookup cost as expected

This means that even with many registered extension types, packing native objects remains fast.

Buffer chunk coalescing

The buffer uses automatic chunk coalescing to reduce memory allocations and improve throughput:

  • Small writes (< 512 bytes) are merged into larger chunks

  • Reduces the number of string objects in memory

  • Improves to_s performance by reducing chunk count

  • Optimized for common patterns like many small integer writes

Buffer read optimization

The buffer’s to_s method has a fast-path for when reading from the beginning (position 0), which is the common case for packers:

  • Uses join for efficient string concatenation

  • Skips offset calculations when position is at 0

  • Significantly faster for single-pass operations

Example 9. Performance comparison
# Native type performance (unaffected by registry size)
Messagepack.pack(nil)           # ~673k ops/sec
Messagepack.pack(42)           # ~607k ops/sec
Messagepack.pack("hello")     # ~498k ops/sec
Messagepack.pack([1,2,3])     # ~230k ops/sec
Messagepack.pack({a: 1, b: 2}) # ~159k ops/sec

# Buffer operations
# With coalescing: 1000 small writes = ~4.7k ops/sec
# Without coalescing: ~3.7k ops/sec (+28% improvement)

Implementation details

Pure Ruby architecture

This implementation is written entirely in Ruby without any C extensions, providing:

  • Portability - Runs on any Ruby implementation (MRI, JRuby, TruffleRuby, etc.)

  • Safety - No memory corruption risks from native code

  • Debuggability - Easy to debug with standard Ruby tools

  • Maintainability - Pure Ruby code is easier to understand and modify

Binary buffer design

The BinaryBuffer class uses a chunked storage design:

BinaryBuffer
├── Chunks (array)
│   ├── Chunk 1 (data)
│   ├── Chunk 2 (data)
│   └── Chunk N (data)
├── Position (read cursor)
└── Length (total bytes)

Where,

  • Chunks - Array of binary strings holding data

  • Position - Current read position across all chunks

  • Length - Total bytes across all chunks

  • Coalescing threshold - Small writes (< 512 bytes) are merged

This design provides:

  • Efficient appends - New data creates chunks, small writes merge

  • Zero-copy reads - Data is read without copying when possible

  • Memory efficiency - Unused chunks can be garbage collected

  • IO integration - Can read from IO objects on demand

Extension registry

The extension registry provides type mapping for custom serialization:

ExtensionRegistry::Packer
├── @registry - Hash of class => [type_id, proc, flags]
└── @cache - Hash of class => [type_id, proc, flags] (ancestor cache)

ExtensionRegistry::Unpacker
└── @array - Array[256] of [class, proc, flags] indexed by type_id

Where,

  • Packer registry uses O(1) hash lookup for direct class matches

  • Ancestor search is O(n) but cached after first lookup

  • Unpacker registry uses O(1) array lookup by type ID

  • Flags control recursive packing and oversized integer handling

Type dispatch

The packer uses a type dispatch system for efficient serialization:

Packer#write(value)
├── Fast-path check (native type?)
│   ├── Yes → Skip registry, use native serialization
│   └── No → Check registry
│       ├── Found in registry → Use extension packer
│       └── Not found → Check to_msgpack method
└── Case statement dispatch → Type-specific writer

This ensures:

  • Native types are serialized without overhead

  • Registered custom types use their packers

  • Unknown types can implement to_msgpack for compatibility

Copyright Ribose. All rights reserved.

Licensed under the MIT License.

About

Pure Ruby implementation of the msgpack binary serialization format (https://msgpack.org)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages