MessagePack is a pure Ruby implementation of the MessagePack binary serialization format.
MessagePack is an efficient binary serialization format that enables exchange of data among multiple languages like JSON, but is faster and smaller.
This implementation provides:
-
Pure Ruby implementation (no C extension required)
-
Full compatibility with the MessagePack specification
-
Support for custom extension types
-
Thread-safe factory pattern for packer/unpacker reuse
-
Streaming unpacker for incremental parsing
-
Comprehensive timestamp support with nanosecond precision
-
Core serialization - Basic pack and unpack operations
-
Performance optimizations - Efficient native type handling and buffer management
-
Factory pattern - Thread-safe packer/unpacker management
-
Extension types - Custom type registration system
-
Timestamp extension - Nanosecond precision time handling
-
Symbol extension - Efficient symbol serialization
-
Streaming unpacking - Incremental data parsing
-
Buffer management - Chunked binary data storage
-
Implementation details - Pure Ruby implementation architecture
┌───────────────────────────────────────────────────────────┐
│ User Application │
└──────────────────────────┬────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
┌───────────────┐ ┌──────────────────┐
│ MessagePack │ │ Factory Pattern │
│ .pack/unpack │ │ (thread-safe) │
└───────┬───────┘ └────────┬─────────┘
│ │
┌───────┴──────┐ ┌───────┴──────────┐
│ │ │ │
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐
│ Packer │ │ Unpacker │ │ Packer │ │ Unpacker │
│ │ │ │ │ Pool │ │ Pool │
└──┬─────┘ └─────┬────┘ └────┬─────┘ └────┬────────┘
│ │ │ │
└─────┬────────┘ └──────┬──────┘
│ │
┌─────┴─────────────────────────────┴────────┐
│ ┌──────────────────────────────────┐ │
│ │ BinaryBuffer (chunked) │ │
│ │ │ │
│ │ ┌────┬────┬────┬────┬─ ─ ─┐ │ │
│ │ │ 1 │ 2 │ 3 │ 4 │ N │ │ │
│ │ └────┴────┴────┴────┴─ ─ ─┘ │ │
│ └──────────────────────────────────┘ │
│ ┌──────────────────────────────────┐ │
│ │ Extension Registry │ │
│ │ │ │
│ │ Timestamp (-1) │ │
│ │ Symbol (0) │ │
│ │ Custom Types (1-127, -2 to -128)│ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────────────┐
│ MessagePack Binary Format │
└──────────────────────────────────────────────────────────────────┘
Positive Fixnum ────────┐
│
Negative Fixnum ────────┼── 0x00-0x7F and 0xE0-0xFF
│ 1 byte format, value embedded
Nil ────────────────────┤
│
Boolean ────────────────┘
UInt 8 ───────────────── 0xCC (1 byte format + 1 byte data)
UInt 16 ──────────────── 0xCD (1 byte format + 2 byte data)
UInt 32 ──────────────── 0xCE (1 byte format + 4 byte data)
UInt 64 ──────────────── 0xCF (1 byte format + 8 byte data)
Int 8 ────────────────── 0xD0 (1 byte format + 1 byte data)
Int 16 ───────────────── 0xD1 (1 byte format + 2 byte data)
Int 32 ───────────────── 0xD2 (1 byte format + 4 byte data)
Int 64 ───────────────── 0xD3 (1 byte format + 8 byte data)
Float 32 ─────────────── 0xCA (1 byte format + 4 byte data)
Float 64 ─────────────── 0xCB (1 byte format + 8 byte data)
FixStr ───────────────── 0xA0-0xBF (1 byte format + 0-31 bytes)
Str 8 ────────────────── 0xD9 (1 byte format + 1 byte length)
Str 16 ───────────────── 0xDA (1 byte format + 2 byte length)
Str 32 ───────────────── 0xDB (1 byte format + 4 byte length)
Bin 8 ────────────────── 0xC4 (1 byte format + 1 byte length)
Bin 16 ───────────────── 0xC5 (1 byte format + 2 byte length)
Bin 32 ───────────────── 0xC6 (1 byte format + 4 byte length)
FixArray ─────────────── 0x90-0x9F (1 byte format + 0-15 elements)
Array 16 ─────────────── 0xDC (1 byte format + 2 byte count)
Array 32 ─────────────── 0xDD (1 byte format + 4 byte count)
FixMap ───────────────── 0x80-0x8F (1 byte format + 0-15 entries)
Map 16 ───────────────── 0xDE (1 byte format + 2 byte count)
Map 32 ───────────────── 0xDF (1 byte format + 4 byte count)
FixExt 1 ─────────────── 0xD4 (1 byte format + 1 byte type + 1 byte)
FixExt 2 ─────────────── 0xD5 (1 byte format + 1 byte type + 2 bytes)
FixExt 4 ─────────────── 0xD6 (1 byte format + 1 byte type + 4 bytes)
FixExt 8 ─────────────── 0xD7 (1 byte format + 1 byte type + 8 bytes)
FixExt 16 ────────────── 0xD8 (1 byte format + 1 byte type + 16 bytes)
Ext 8 ────────────────── 0xC7 (1 byte format + 1 byte len + 1 byte type)
Ext 16 ───────────────── 0xC8 (1 byte format + 2 byte len + 1 byte type)
Ext 32 ───────────────── 0xC9 (1 byte format + 4 byte len + 1 byte type)Add this line to your application’s Gemfile:
gem 'messagepack'And then execute:
bundle installOr install it yourself as:
gem install messagepackThe core MessagePack API provides simple pack and unpack methods for
serializing and deserializing Ruby objects.
Use Messagepack.pack to serialize Ruby objects to binary format:
Messagepack.pack({hello: "world"}) # => "\x81\xA5hello\xA5world"Where,
-
Messagepack.packaccepts any Ruby object as its argument -
The return value is a binary string containing the serialized data
-
Supported types include: nil, boolean, integer, float, string, array, hash, and any registered extension types
Use Messagepack.unpack to deserialize binary data back to Ruby objects:
data = Messagepack.pack({hello: "world"})
Messagepack.unpack(data) # => {"hello"=>"world"}Where,
-
Messagepack.unpackaccepts a binary string or IO object -
The return value is the original Ruby object
-
Extra bytes after the deserialized object will raise a
Messagepack::MalformedFormatError
# Serialize a complex object
data = {
name: "Alice",
age: 30,
skills: ["Ruby", "Python"],
metadata: {
active: true,
score: 95.5
}
}
binary = Messagepack.pack(data)
# => "\x84\xA4name\xA5Alice\xA3age\x1E\xA6skills\
# \x92\xA4Ruby\xA6Python\xA8metadata\x82\xA6active\
# \xC3\xA5score\xCB@_\x00\x00"
# Deserialize back to a Ruby object
result = Messagepack.unpack(binary)
# => {"name"=>"Alice", "age"=>30, "skills"=>["Ruby", "Python"],
# "metadata"=>{"active"=>true, "score"=>95.5}}The Messagepack::Factory class provides thread-safe management of packer and
unpacker instances with support for custom type registrations.
factory = Messagepack::Factory.newWhere,
-
Factory.newcreates a new factory instance -
Each factory maintains its own type registry
-
Factories can be frozen for thread-safe use
factory.register_type(0x01, MyClass,
packer: :to_msgpack_ext,
unpacker: :from_msgpack_ext
)Where,
-
0x01is the type identifier (must be -128 to 127) -
MyClassis the Ruby class to register -
packerspecifies how to serialize instances (symbol, method, or proc) -
unpackerspecifies how to deserialize data (symbol, method, or proc)
pool = factory.pool(5) # Create pool with 5 packers/unpackers
data = pool.pack(my_object) # Thread-safe packing
obj = pool.unpack(binary) # Thread-safe unpackingWhere,
-
factory.pool(size)creates a thread-safe pool -
sizeis the number of packer/unpacker instances in the pool -
The pool automatically manages instance reuse
-
Each thread gets its own instance from the pool
# Create a factory with custom types
factory = Messagepack::Factory.new
factory.register_type(0x01, MyCustomClass,
packer: ->(obj) { obj.serialize },
unpacker: ->(data) { MyCustomClass.deserialize(data) }
)
# Create a thread-safe pool
pool = factory.pool(10)
# Use from multiple threads safely
threads = 10.times.map do |i|
Thread.new do
object = MyCustomClass.new("data-#{i}")
binary = pool.pack(object)
result = pool.unpack(binary)
result.value
end
end
puts threads.map(&:value).inspectMessagePack supports custom extension types for serializing objects that don’t have a native MessagePack representation.
factory.register_type(type_id, class,
packer: packer_specification,
unpacker: unpacker_specification
)Where,
-
type_idis an integer from -128 to 127 -
classis the Ruby class to serialize -
packer_specificationcan be: -
A symbol (method name to call on the object)
-
A proc (called with the object)
-
A method object
-
unpacker_specificationcan be: -
A symbol (class method to call)
-
A proc (called with the payload data)
-
A method object
factory.register_type(0x02, MyContainer,
packer: ->(obj, packer) { packer.write(obj.to_h) },
unpacker: ->(unpacker) { MyContainer.from_hash(unpacker.read) },
recursive: true
)Where,
-
recursive: trueenables nested serialization -
The
packerlambda receives the packer instance for recursive calls -
The
unpackerlambda receives the unpacker instance for recursive reads
class Money
attr_reader :amount, :currency
def initialize(amount, currency)
@amount = amount
@currency = currency
end
def to_msgpack_ext
[amount, currency].pack("QA*")
end
def self.from_msgpack_ext(data)
amount, currency = data.unpack("QA*")
new(amount, currency)
end
end
factory = Messagepack::Factory.new
factory.register_type(0x10, Money,
packer: :to_msgpack_ext,
unpacker: :from_msgpack_ext
)
money = Money.new(1000, "USD")
binary = factory.pack(money)
result = factory.unpack(binary)
# => #<Money:0x... @amount=1000, @currency="USD">The timestamp extension (type -1) provides nanosecond precision time handling for Time objects.
Timestamp32 - 4 bytes (seconds only, 32-bit)
Used when: nanoseconds == 0 and
seconds fit in 32 bits
Timestamp64 - 8 bytes (seconds + nanoseconds)
Used when: nanoseconds != 0 and
timestamp fits in 64 bits
Timestamp96 - 12 bytes (seconds + nanoseconds, 96-bit)
Used when: timestamp requires 96 bitsfactory.register_type(-1, Time,
packer: Messagepack::Time::Packer,
unpacker: Messagepack::Time::Unpacker
)Where,
-
-1is the reserved type ID for timestamps -
Messagepack::Time::Packerhandles serialization with nanosecond precision -
Messagepack::Time::Unpackerhandles deserialization
factory = Messagepack::Factory.new
factory.register_type(-1, Time,
packer: Messagepack::Time::Packer,
unpacker: Messagepack::Time::Unpacker
)
# Current time with nanosecond precision
now = Time.now
binary = factory.pack(now)
restored = factory.unpack(binary)
puts restored.tv_nsec # Nanoseconds preserved
# Historical date
time = Time.utc(2020, 1, 1, 12, 30, 45)
binary = factory.pack(time)
puts binary.size # => 6 (fixext4 format)
# Future date with nanoseconds
future = Time.utc(2100, 6, 15, 0, 0, 0, 123456789)
binary = factory.pack(future)
puts binary.size # => 15 (ext8 with timestamp96)The symbol extension (type 0) provides efficient serialization of Ruby symbols.
factory.register_type(0, Symbol)Where,
-
0is the type ID for symbols -
The extension uses
to_symandto_sfor packing/unpacking
factory.register_type(0, Symbol)
binary = factory.pack(:hello_symbol)
result = factory.unpack(binary) # => :hello_symbolWhere,
-
Symbols are serialized as their string representation
-
Deserialization converts the string back to a symbol
-
This is more efficient than serializing as strings
factory = Messagepack::Factory.new
factory.register_type(0, Symbol)
data = {
status: :active,
priority: :high,
tags: [:important, :urgent]
}
binary = factory.pack(data)
result = factory.unpack(binary)
# => {:status=>:active, :priority=>:high, :tags=>[:important, :urgent]}The streaming unpacker allows incremental parsing of MessagePack data as it becomes available.
unpacker = Messagepack::Unpacker.new
unpacker.feed("\x81") # Feed partial data
unpacker.feed("\xA3") # Feed more
unpacker.feed("foo") # Feed final part
obj = unpacker.read # => {"foo"=>nil}Where,
-
Unpacker.newcreates a new unpacker instance -
feed(data)appends data to the buffer -
readreturns one complete object ornilif more data is needed
unpacker = Messagepack::Unpacker.new(io)
obj = unpacker.read # Reads from IO as neededWhere,
-
Unpacker.new(io)creates an unpacker attached to an IO -
The unpacker automatically reads from the IO when needed
-
Use
full_unpackto read a single object and reset
require 'socket'
# Simulate receiving data in chunks
unpacker = Messagepack::Unpacker.new
chunks = ["\x81\xA3", "foo", "\xA5", "world"]
chunks.each do |chunk|
unpacker.feed(chunk)
obj = unpacker.read
if obj
puts "Received: #{obj.inspect}"
else
puts "Waiting for more data..."
end
end
# Output:
# Waiting for more data...
# Waiting for more data...
# Waiting for more data...
# Received: {"foo"=>"world"}The BinaryBuffer class provides efficient chunked storage for binary data.
buffer = Messagepack::BinaryBuffer.new
buffer << "data"
buffer.read(4) # => "data"
buffer.to_s # => ""Where,
-
BinaryBuffer.newcreates a new buffer -
<<appends data to the buffer -
read(n)reads and consumes n bytes -
to_sreturns remaining data without consuming
buffer = Messagepack::BinaryBuffer.new
buffer << "\x81\xA3foo\xA5world"
buffer.skip # Skip one object (format byte)
buffer.skip_nil # Skip nil value if presentWhere,
-
skipskips a complete MessagePack object -
skip_nilefficiently skips nil values
File.open("data.msgpack", "rb") do |io|
buffer = Messagepack::BinaryBuffer.new(io)
unpacker = Messagepack::Unpacker.new(buffer)
obj = unpacker.read
endWhere,
-
The buffer reads from the IO when needed
-
Data is automatically managed in chunks
-
Suitable for large files that don’t fit in memory
# Process a large file without loading everything into memory
buffer = Messagepack::BinaryBuffer.new(File.open("large.msgpack", "rb"))
unpacker = Messagepack::Unpacker.new(buffer)
while obj = unpacker.read
# Process each object one at a time
process(obj)
endThis implementation includes several performance optimizations that make the pure Ruby implementation efficient for typical use cases.
Native MessagePack types (nil, boolean, integer, float, string, symbol, array, hash) bypass the extension registry lookup for optimal performance:
-
Native types are identified without O(n) registry search
-
Native types with custom extension registrations still use the registry
-
Custom types pay the registry lookup cost as expected
This means that even with many registered extension types, packing native objects remains fast.
The buffer uses automatic chunk coalescing to reduce memory allocations and improve throughput:
-
Small writes (< 512 bytes) are merged into larger chunks
-
Reduces the number of string objects in memory
-
Improves
to_sperformance by reducing chunk count -
Optimized for common patterns like many small integer writes
The buffer’s to_s method has a fast-path for when reading from the beginning
(position 0), which is the common case for packers:
-
Uses
joinfor efficient string concatenation -
Skips offset calculations when position is at 0
-
Significantly faster for single-pass operations
# Native type performance (unaffected by registry size)
Messagepack.pack(nil) # ~673k ops/sec
Messagepack.pack(42) # ~607k ops/sec
Messagepack.pack("hello") # ~498k ops/sec
Messagepack.pack([1,2,3]) # ~230k ops/sec
Messagepack.pack({a: 1, b: 2}) # ~159k ops/sec
# Buffer operations
# With coalescing: 1000 small writes = ~4.7k ops/sec
# Without coalescing: ~3.7k ops/sec (+28% improvement)This implementation is written entirely in Ruby without any C extensions, providing:
-
Portability - Runs on any Ruby implementation (MRI, JRuby, TruffleRuby, etc.)
-
Safety - No memory corruption risks from native code
-
Debuggability - Easy to debug with standard Ruby tools
-
Maintainability - Pure Ruby code is easier to understand and modify
The BinaryBuffer class uses a chunked storage design:
BinaryBuffer
├── Chunks (array)
│ ├── Chunk 1 (data)
│ ├── Chunk 2 (data)
│ └── Chunk N (data)
├── Position (read cursor)
└── Length (total bytes)Where,
-
Chunks- Array of binary strings holding data -
Position- Current read position across all chunks -
Length- Total bytes across all chunks -
Coalescing threshold - Small writes (< 512 bytes) are merged
This design provides:
-
Efficient appends - New data creates chunks, small writes merge
-
Zero-copy reads - Data is read without copying when possible
-
Memory efficiency - Unused chunks can be garbage collected
-
IO integration - Can read from IO objects on demand
The extension registry provides type mapping for custom serialization:
ExtensionRegistry::Packer
├── @registry - Hash of class => [type_id, proc, flags]
└── @cache - Hash of class => [type_id, proc, flags] (ancestor cache)
ExtensionRegistry::Unpacker
└── @array - Array[256] of [class, proc, flags] indexed by type_idWhere,
-
Packer registry uses O(1) hash lookup for direct class matches
-
Ancestor search is O(n) but cached after first lookup
-
Unpacker registry uses O(1) array lookup by type ID
-
Flags control recursive packing and oversized integer handling
The packer uses a type dispatch system for efficient serialization:
Packer#write(value)
├── Fast-path check (native type?)
│ ├── Yes → Skip registry, use native serialization
│ └── No → Check registry
│ ├── Found in registry → Use extension packer
│ └── Not found → Check to_msgpack method
└── Case statement dispatch → Type-specific writerThis ensures:
-
Native types are serialized without overhead
-
Registered custom types use their packers
-
Unknown types can implement
to_msgpackfor compatibility