- Refactor PixelsConsumer:
Introduce an abstract base class (AbstractPixelsConsumer) to handle common initialization and cleanup.
Create a concrete subclass (IndexedPixelsConsumer) dedicated to handling loads where a Primary Index exists.
Create a simple subclass (SimplePixelsConsumer) for loads without an Index (maintaining existing sequential logic).
- Bucket-Based Routing Logic:
In IndexedPixelsConsumer, maintain a map to track active writers: Map<Integer, PerBucketWriter>.
For every incoming data row:
-
Calculate the data's bucketId based on its Primary Key hash.
-
Use the bucketId to look up the corresponding PerBucketWriter state object.
-
If no writer exists for the bucketId, dynamically initialize a new PixelsWriter and temporary File.
- Core Dependency: Node Mapping Cache:
Implement BucketToNodeCache (Small Component): Create a thread-safe, singleton, lazy-loaded cache component to quickly map a bucketId to its responsible RetinaNodeInfo. This cache reduces the necessity of repeatedly querying the NodeService for node assignment during the high-throughput loading process.
- Distributed Indexing:
Ensure that index entries generated by IndexedPixelsConsumer are routed to the correct IndexService instance, potentially identified by the RetinaNodeInfo obtained from the cache.
Introduce an abstract base class (AbstractPixelsConsumer) to handle common initialization and cleanup.
Create a concrete subclass (IndexedPixelsConsumer) dedicated to handling loads where a Primary Index exists.
Create a simple subclass (SimplePixelsConsumer) for loads without an Index (maintaining existing sequential logic).
In IndexedPixelsConsumer, maintain a map to track active writers: Map<Integer, PerBucketWriter>.
For every incoming data row:
Calculate the data's bucketId based on its Primary Key hash.
Use the bucketId to look up the corresponding PerBucketWriter state object.
If no writer exists for the bucketId, dynamically initialize a new PixelsWriter and temporary File.
Implement BucketToNodeCache (Small Component): Create a thread-safe, singleton, lazy-loaded cache component to quickly map a bucketId to its responsible RetinaNodeInfo. This cache reduces the necessity of repeatedly querying the NodeService for node assignment during the high-throughput loading process.
Ensure that index entries generated by IndexedPixelsConsumer are routed to the correct IndexService instance, potentially identified by the RetinaNodeInfo obtained from the cache.