Gecko, a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions.
For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams.
Gecko has been accepted by TKDE (Transactions on Knowledge and Data Engineering).
If you use Gecko in your research, please cite our TKDE paper:
Jianjun Li, Yuhui Deng, Jiande Huang, Yi Zhou, Qifen Yang, and Geyong Min, "Gecko: Efficient Sliding Window Aggregation with Granular-based Bulk Eviction over Big Data" in IEEE Transactions on Knowledge and Data Engineering. doi: 10.1109/TKDE.2024.3511334.
This repo contains reference implementations of sliding window aggregation algorithms.
All of these algorithms require operators that are associative. We classify the algorithms in two groups: those that require data to arrive in-order, and those that allow data to arrive out-of-order. We refer to the algorithms that require data to arrive in-order as FIFO algorithms, as they assume first-in, first-out semantics. We refer to the algorithms that tolerate disordered data as general algorithms.
The algorithmic complexity of the algorithms is with respect to the size of the window, n.
A tutorial and encyclopedia article provide more background on sliding window aggregation algorithms.
- full name: De-Amortized Banker's Aggregator
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(1)
- space requirements: 2n
- first appeared: Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time
- implementions: C++
- full name: De-Amortized Banker's Aggregator Lite
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(1)
- space requirements: n + 2
- first appeared: In-Order Sliding-Window Aggregation in Worst-Case Constant Time
- implementions: C++
- full name: Finger B-Tree Aggregator
- ordering: out-of-order allowed, assumes data is timestamped
- operator requirements: associativity
- time complexity: amortized O(log d) where d is distance newly arrived data is from being in-order, worst-case O(log n); for in-order data (d = 0), amortized O(1) and worst-case O(log n)
- space requirements: O(n)
- first appeared: Optimal and General Out-of-Order Sliding-Window Aggregation
- implementions: C++
- full name: Flat and Fast Index Traverser
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(n), amortized O(1)
- space requirements: 2n
- first appeared: FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics
- implementions: C++ (static windows), C++ (dynamic windows), Rust (dynamic windows)
- full name: Imperative Okasaki Aggregator
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(1)
- space requirements: O(n)
- first appeared: Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time
- implementions: C++
- full name: Two-Stacks
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(n), amortized O(1)
- space requirements: 2n
- first appeared: adamax on Stack Overflow
- implementions: C++, Rust
- full name: Two-Stacks Like
- ordering: in-order required
- operator requirements: associativity
- time complexity: worst-case O(n), amortized O(1)
- space requirements: n + 1
- first appeared: In-Order Sliding-Window Aggregation in Worst-Case Constant Time
- implementions: C++, Rust
- full name: Reactive Aggregator
- ordering: out-of-order allowed
- operator requirements: associativity
- time complexity: worst-case O(log n)
- space requirements: O(n)
- first appeared: General Incremental Sliding-Window Aggregation
- implementions: C++, Rust
- full name: Re-Calculate From Scratch
- ordering: out-of-order allowed
- operator requirements: none
- time complexity: O(n)
- space requirements: n
- first appeared: no known source
- implementations: C++, Rust
- full name: Subtract on Evict
- ordering: out-of-order allowed
- operator requirements: associativity, invertability
- time complexity: worst-case O(1)
- space requirements: n
- first appeared: no known source
- implementations: C++ (strictly in-order), Rust (strictly in-order)
- full name: Amortized Monoid Tree Aggregator
- ordering: in-order required
- operator requirements: associativity
- time complexity:
- worst-case O(log n), amortized O(1) for
insertandevict; - worst-case O(1) for
query; and - worst-case O(log n) for
bulkEvict
- worst-case O(log n), amortized O(1) for
- space requirements: O(n)
- first appeared: Constant-Time Sliding Window Framework with Reduced Memory Footprint and Efficient Bulk Evictions
- implementions: C++