Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,101 @@ sample query for, How many unique users visited both product A and B?
]
}
```

#### Retention Analysis Example

Suppose you want to answer a question like, "How many unique users performed a specific action in a particular time period and also performed another specific action in a different time period?"

e.g., "How many unique users signed up in week 1, and purchased something in week 2?"

Using the `(timestamp, product, user_id)` example dataset, data would be indexed with the following aggregator, like in the example above:

```json
{ "type": "thetaSketch", "name": "user_id_sketch", "fieldName": "user_id" }
```

The following query expresses:

"Out of the unique users who visited Product A between 10/01/2014 and 10/07/2014, how many visited Product A again in the week of 10/08/2014 to 10/14/2014?"

```json
{
"queryType": "groupBy",
"dataSource": "test_datasource",
"granularity": "ALL",
"dimensions": [],
"filter": {
"type": "or",
"fields": [
{"type": "selector", "dimension": "product", "value": "A"}
]
},
"aggregations": [
{
"type" : "filtered",
"filter" : {
"type" : "and",
"fields" : [
{
"type" : "selector",
"dimension" : "product",
"value" : "A"
},
{
"type" : "interval",
"dimension" : "__time",
"intervals" : ["2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z"]
}
]
},
"aggregator" : {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some extraneous whitespace here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm removed whitespace

"type": "thetaSketch", "name": "A_unique_users_week_1", "fieldName": "user_id_sketch"
}
},
{
"type" : "filtered",
"filter" : {
"type" : "and",
"fields" : [
{
"type" : "selector",
"dimension" : "product",
"value" : "A"
},
{
"type" : "interval",
"dimension" : "__time",
"intervals" : ["2014-10-08T00:00:00.000Z/2014-10-14T00:00:00.000Z"]
}
]
},
"aggregator" : {
"type": "thetaSketch", "name": "A_unique_users_week_2", "fieldName": "user_id_sketch"
}
},
],
"postAggregations": [
{
"type": "thetaSketchEstimate",
"name": "final_unique_users",
"field":
{
"type": "thetaSketchSetOp",
"name": "final_unique_users_sketch",
"func": "INTERSECT",
"fields": [
{
"type": "fieldAccess",
"fieldName": "A_unique_users_week_1"
},
{
"type": "fieldAccess",
"fieldName": "A_unique_users_week_2"
}
]
}
}
],
"intervals": ["2014-10-01T00:00:00.000Z/2014-10-14T00:00:00.000Z"]
}
```
108 changes: 88 additions & 20 deletions docs/content/querying/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,33 @@ Search filters can be used to filter on partial string matches.

The search filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.

#### Search Query Spec

##### Contains

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "contains".|yes|
|value|A String value to run the search over.|yes|
|caseSensitive|Whether two string should be compared as case sensitive or not|no (default == false)|

##### Insensitive Contains

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "insensitive_contains".|yes|
|value|A String value to run the search over.|yes|

Note that an "insensitive_contains" search is equivalent to a "contains" search with "caseSensitive": false (or not
provided).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked as 'deprecated' now right? Also maybe the example above should be updated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vogievetsky was insensitive contains deprecated? I just moved this SearchQuerySpec section around since it was separated from the SearchFilter documentation by the InFilter stuff


##### Fragment

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "fragment".|yes|
|values|A JSON array of String values to run the search over.|yes|
|caseSensitive|Whether strings should be compared as case sensitive or not. Default: false(insensitive)|no|

### In filter

Expand Down Expand Up @@ -245,33 +272,62 @@ Likewise, this filter expresses `age >= 18`
```


#### Search Query Spec
### Interval Filter

##### Contains
The Interval filter enables range filtering on columns that contain long millisecond values, with the boundaries specified as ISO 8601 time intervals. It is suitable for the `__time` column, long metric columns, and dimensions with values that can be parsed as long milliseconds.

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "contains".|yes|
|value|A String value to run the search over.|yes|
|caseSensitive|Whether two string should be compared as case sensitive or not|no (default == false)|
This filter converts the ISO 8601 intervals to long millisecond start/end ranges and translates to an OR of Bound filters on those millisecond ranges, with numeric comparison. The Bound filters will have left-closed and right-open matching (i.e., start <= time < end).

##### Insensitive Contains
|property|type|description|required?|
|--------|-----------|---------|---------|
|type|String|This should always be "interval".|yes|
|dimension|String|The dimension to filter on|yes|
|intervals|Array|A JSON array containing ISO-8601 interval strings. This defines the time ranges to filter on.|yes|
|extractionFn|[Extraction function](#filtering-with-extraction-functions)| Extraction function to apply to the dimension|no|

The interval filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "insensitive_contains".|yes|
|value|A String value to run the search over.|yes|
If an extraction function is used with this filter, the extraction function should output values that are parseable as long milliseconds.

Note that an "insensitive_contains" search is equivalent to a "contains" search with "caseSensitive": false (or not
provided).
The following example filters on the time ranges of October 1-7, 2014 and November 15-16, 2014.
```json
{
"type" : "interval",
"dimension" : "__time",
"intervals" : [
"2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
"2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
]
}
```

##### Fragment
The filter above is equivalent to the following OR of Bound filters:

|property|description|required?|
|--------|-----------|---------|
|type|This String should always be "fragment".|yes|
|values|A JSON array of String values to run the search over.|yes|
|caseSensitive|Whether strings should be compared as case sensitive or not. Default: false(insensitive)|no|
```json
{
"type": "or",
"fields": [
{
"type": "bound",
"dimension": "__time",
"lower": "1412121600000",
"lowerStrict": false,
"upper": "1412640000000" ,
"upperStrict": true,
"ordering": "numeric"
},
{
"type": "bound",
"dimension": "__time",
"lower": "1416009600000",
"lowerStrict": false,
"upper": "1416096000000" ,
"upperStrict": true,
"ordering": "numeric"
}
]
}
```

### Filtering with Extraction Functions
Some filters optionally support the use of extraction functions.
Expand Down Expand Up @@ -343,3 +399,15 @@ Filtering on day of week:
}
}
```

Filtering on a set of ISO 8601 intervals:
```json
{
"type" : "interval",
"dimension" : "__time",
"intervals" : [
"2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
"2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
]
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,42 @@ public void testCacheKey()
Assert.assertFalse(Arrays.equals(factory1.getCacheKey(), factory3.getCacheKey()));
}

@Test
public void testRetentionDataIngestAndGpByQuery() throws Exception
{
Sequence<Row> seq = helper.createIndexAndRunQueryOnSegment(
new File(this.getClass().getClassLoader().getResource("retention_test_data.tsv").getFile()),
readFileFromClasspathAsString("simple_test_data_record_parser.json"),
readFileFromClasspathAsString("simple_test_data_aggregators.json"),
0,
QueryGranularities.NONE,
5,
readFileFromClasspathAsString("retention_test_data_group_by_query.json")
);

List<Row> results = Sequences.toList(seq, Lists.<Row>newArrayList());
Assert.assertEquals(1, results.size());
Assert.assertEquals(
ImmutableList.of(
new MapBasedRow(
DateTime.parse("2014-10-19T00:00:00.000Z"),
ImmutableMap
.<String, Object>builder()
.put("product", "product_1")
.put("p1_unique_country_day_1", 20.0)
.put("p1_unique_country_day_2", 20.0)
.put("p1_unique_country_day_3", 10.0)
.put("sketchEstimatePostAgg", 20.0)
.put("sketchIntersectionPostAggEstimate1", 10.0)
.put("sketchIntersectionPostAggEstimate2", 5.0)
.put("non_existing_col_validation", 0.0)
.build()
)
),
results
);
}

private void assertPostAggregatorSerde(PostAggregator agg) throws Exception
{
Assert.assertEquals(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
2014102001 product_1 pty_country_1
2014102001 product_1 pty_country_2
2014102001 product_1 pty_country_3
2014102001 product_1 pty_country_4
2014102001 product_1 pty_country_5
2014102001 product_1 pty_country_6
2014102001 product_1 pty_country_7
2014102001 product_1 pty_country_8
2014102001 product_1 pty_country_9
2014102001 product_1 pty_country_10
2014102001 product_1 pty_country_11
2014102001 product_1 pty_country_12
2014102001 product_1 pty_country_13
2014102001 product_1 pty_country_14
2014102001 product_1 pty_country_15
2014102001 product_1 pty_country_16
2014102001 product_1 pty_country_17
2014102001 product_1 pty_country_18
2014102001 product_1 pty_country_19
2014102001 product_1 pty_country_20
2014102101 product_1 pty_country_1
2014102101 product_1 pty_country_2
2014102101 product_1 pty_country_3
2014102101 product_1 pty_country_4
2014102101 product_1 pty_country_5
2014102101 product_1 pty_country_6
2014102101 product_1 pty_country_7
2014102101 product_1 pty_country_8
2014102101 product_1 pty_country_9
2014102101 product_1 pty_country_10
2014102101 product_1 pty_country_50
2014102101 product_1 pty_country_51
2014102101 product_1 pty_country_52
2014102101 product_1 pty_country_53
2014102101 product_1 pty_country_54
2014102101 product_1 pty_country_55
2014102101 product_1 pty_country_56
2014102101 product_1 pty_country_57
2014102101 product_1 pty_country_58
2014102101 product_1 pty_country_59
2014102201 product_1 pty_country_1
2014102201 product_1 pty_country_2
2014102201 product_1 pty_country_3
2014102201 product_1 pty_country_4
2014102201 product_1 pty_country_5
2014102201 product_1 pty_country_60
2014102201 product_1 pty_country_61
2014102201 product_1 pty_country_62
2014102201 product_1 pty_country_63
2014102201 product_1 pty_country_64
Loading