#388 introduced the Pool abstract class used for pooling reader state, reducing the cost to instantiate readers. The Pool implementation is currently backed by an ArrayBlockingQueue.
#389 improved the performance of the binary writer's PooledBlockAllocator, which is backed by a ConcurrentLinkedQueue, by eliminating reliance on ConcurrentLinkedQueue.size(). During the evaluation of that solution, it was determined that ArrayBlockingQueue did not perform as well.
Consider replacing ArrayBlockingQueue with ConcurrentLinkedQueue in Pool. This may improve performance (as long as ConcurrentLinkedQueue.size() is not used), particularly under high contention by many different threads.