Optimize activation list query handling for CosmosDB

In our setup we are seeing high RU consumption for list activation query. This query is created as a result of `wsk activation poll` command which translates to 

```
/api/v1/namespaces/_/activations?docs=true&limit=0&since=1542422386538&skip=0
```

We mitigated the cost to some extent by reducing the fetched document count (#4157). However still the poll query consumes quite a bit of the provisioned RU. One reason for this high usage is due to this query being cross partition. 

CosmosDB has a limit of 10GB per partition. To avoid hitting that limit we store the activations using the `id` as the partition key. Hence a query listing the "recent" activations has to be executed across all partitions (fan out) and then result merged by SDK on client side. Checking the list query usage following aspects stand out

1. All query are performed in descending order. From client side currently user cannot change the sort order
2. Most list calls specify `since` but not the `upto` (upto can only be specified via `list` command which is not used much). So in that case the top results are only the most recent results
3. Skip is mostly 0
4. Its mostly used by developers actively working where they want to see "recent" activations. So result should mostly have activations from recent past and not very old


## Materialized View

Given above aspects one way we can optimize the query handling is by using [Materialized View pattern][1]. In this we would have a new collection `activations_query`

1. It would use the `namespace` as partition key
2. Have a much shorter TTL say 1 hr (see also below)

### TTL for Materialized View

To avoid hitting the partition limits we would need to keep a very low TTL for `activations_query`. To determine what TTL need to be used we would need to collect metrics on poll flow to see how much old activation we return compared to current time (#4688).

In general its seen that due to descending sort poll query fetches very latest data. Once we have metrics around this we can confirm this hypothesis and use that value as TTL

### Write Flow

To add activations to it we would be running 2 copies of Activation Persister Service (#4632). 

1. First service would write to existing `activations` collection which has a longer TTL say 7 days
2. Second service would write to new `activations_query`

### Query Flow

For the query part we would need to have a `MultiplexingArtifactStore` impl where the query call would result in 

1. Query from `activations_query` first and check if its able to fetch result upto the `limit`
2. If result count is upto limit then return that
3. ELSE query `activations` and fetch remaining result

For most cases I expect that code path would only make first call and make use of fast path and thus resulting in lesser over all RU usage

[1]: https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize activation list query handling for CosmosDB #4684

Materialized View

TTL for Materialized View

Write Flow

Query Flow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize activation list query handling for CosmosDB #4684

Description

Materialized View

TTL for Materialized View

Write Flow

Query Flow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions