Currently, CDF allows users to specify a begin version and an end version to retrieve incrementally inserted data or incrementally updated data. In fact, users often have the following requirements when using incremental reading:
- Set a time window by specifying a start date and an end date to limit the time range.
- Retrieve incremental upsert data (inserts and updates) through a single SQL query.
For the first requirement:
- Allow users to specify a start date timestamp and an end date timestamp. Iterate through the versions to find the first version greater than the start date timestamp and the last version less than the end date timestamp, then set these as the start version and end version accordingly.
For the second requirement:
- It is necessary to construct filter conditions for upserts to retrieve both inserted and updated data simultaneously.
Currently, CDF allows users to specify a begin version and an end version to retrieve incrementally inserted data or incrementally updated data. In fact, users often have the following requirements when using incremental reading:
For the first requirement:
For the second requirement: