Skip to content

Data missing when using CassandraIO.Read #21715

@damccorm

Description

@damccorm

Bug

Data at the beginning or end of the token ring is never retrieved, due to a bad TokenRange request.

This bug was introduced by BEAM-9008, in this commit

A basic reproduction case & workarounds are available here:

Github/beam-cassandraio-bug

Description

When using {}CassandraIO{}, a list of token ranges is requested to C* nodes in order to create splits in those ranges.
A split will be represented as a RingRange resulting in a request to C* in the form of
TOKEN(partition_key) \>= range_start AND TOKEN(partition_key) < range_end

The token ring goes from Long.MIN_VALUE to Long.MAX_VALUE (so -2xxx to 2xxx), a range may contains the "join point" and be represented by [2xx, -2xxx].

In this case (Aka TokenRange isWrapping), old implementation used to send 2 different requests:

  • TOKEN(partition_key) >= range_start (To get result up to the end of the ring, i.e. Long.MAX_VALUE)
  • TOKEN(partition_key) < range_end (To get result from the beginning end of the ring, i.e. Long.MIN_VALUE)

Now, this behavior is not implemented anymore and token ranges are all called the same way, even in the wrapping case.
It results in a request like :
TOKEN(partition_key) >= 2XXX AND TOKEN(partition_key) < -2xxx
This gives 0 results, and some data is never retrieved.

 

WorkArounds

  • Downgrade to 2.33.0
  • Use customer TokenRanges & readAll implementation

Imported from Jira BEAM-14558. Original Jira may contain additional context.
Reported by: croquette.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions