Enable customized result format for `scan-query`

As described in the document (see [here](http://druid.io/docs/latest/development/extensions-contrib/scan-query.html)), `scan-query` currently supports two kinds of `resultFormat`, `compatedList` and `list`,  and they both return data in a JSON array while keeping real actual raw data nested (e.g., `events`). For example:

```java
[{
    "segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
    "columns" : [
      "timestamp",
      "robot",
      "namespace",
      "anonymous",
      "unpatrolled",
      "page",
      "language",
      "newpage",
      "user",
      "count",
      "added",
      "delta",
      "variation",
      "deleted"
    ],
    "events" : [ {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "1",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "11._korpus_(NOVJ)",
        "language" : "sl",
        "newpage" : "0",
        "user" : "EmausBot",
        "count" : 1.0,
        "added" : 39.0,
        "delta" : 39.0,
        "variation" : 39.0,
        "deleted" : 0.0
    }]
}]
```

Before receiving the last chunk of data, it's very difficult to extract `events` on a per-chunk basis. If the amount of data returned is too huge, this will cause memory pressure on the client side. If I understand correctly, however, this does not well support HTTP streaming while enabling the capability of doing some processing on per event basis before piping to next sink. 

Use [http4s streaming](http://http4s.org/v0.17/streaming/) as an example,  if the returned format can be customized like below (output from [http://mockbin.org/stream/10](http://mockbin.org/stream/10)):

```java
{"type":"stream","chunk":1}
{"type":"stream","chunk":2}
{"type":"stream","chunk":3}
```

I can easily carry out some processing per JSON object without too much memory consumption. Such as (below snippets can be found at [here](http://http4s.org/v0.17/streaming/)):

```scala
def f(obj: Json): String = obj.spaces2 // use `circe` to format Json string

implicit val f = io.circe.jawn.CirceSupportParser.facade
val client = PooledHttp1Client()

// query druid using `scan-query`
val streams = res <- client.streaming(POST(uri, payload))(_.body.chunks.parseJsonStream)
streams.map(x => f(x))
  .through(lines)
  .through(utf8Encode)
  .to(stdout) // pipe to stdout as an example
  .onFinalize(client.shutdown).run
```

It would be nice if there's a plan to support this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable customized result format for `scan-query` #4825

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Enable customized result format for scan-query #4825

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Enable customized result format for `scan-query` #4825