Skip to content

Paginated query over 2i with range return non existent data #498

@peczenyj

Description

@peczenyj

Hi all

I am using Riak 1.4.2 + levelDB and I find something strange.

We have one index called 'expiration_epoch_int', it is like a TTL for a particular key in this bucket. To find expired data to delete is just query over expiration_epoch_int between 0 and 'now'. For a small amont of data it seems really good.

But today I find this: the first 10 results from this query return non-existent keys. It was already deleted. I receive one 'HTTP/1.1 404 Object Not Found' if I try to inspect.

If I use a small range, like around +/- 1 second from now, I can find good results (keys who exists in Riak) but if I start from 0 ( or 1) at least the begining are keys who does not exist.

If I use return_terms=true I can find the expiration_epoch_int too (it returns data between 10 and 20 days ago). I am using the PBC interface for query and delete.

So, my question: why this happens? can be related to pagination (maybe some cache)? When we perform our cleanup process, we process, for example, ~7 x10^6 keys.

To control the expiration of a huge amount of data, it is save use only one secondary index? There is some limit for a huge number of keys? I have no idea where I can start to investigate this.

I will try to run a more complete test to find the % of deleted keys returned by Riak.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions