Skip to content

Filter for parsing of JSON fields #1

@nickmyatt

Description

@nickmyatt

JsonReceiver implements a basic mapping between Postgres scalar types and similar JSON scalar types:

  • string
  • boolean
  • number

it does not implement any mapping to compound (object, array) types however, even when an "obvious" mapping exists (for example from Postgres json[b] values); all other values are emitted as JSON strings (presumably according to Postgres default text representation for that type?), meaning that we see output like:

{
  "tx": [
    {
      "schema": "public",
      "table": "questions",
      "colnames": [
        "title",
        "tags",
      ],
      "coltypes": [
        "text",
        "question_tags",
      ],
      "values": [
        "How do I parse this nested JSON?",
        "[\"json\", \"postgresql\", \"python\"]",
      ],
      "op": "I"
    }
  ]
}

when it might be more convenient for the replisome consumer to get something like

{
...
      "values": [
        "How do I parse this nested JSON?",
        ["json", "postgresql", "python"],
...
}

note also that the coltype of tags is question_tags and not jsonb, even though we have defined:

CREATE DOMAIN question_tags AS jsonb;

There are at least 2 good reasons why the JsonReceiver should not (by default) emit nested JSON in this way, however:

  1. the empty object {} is currently used to represent the special case of "unchanged TOASTed value" in the values array, and since large json[b] fields are TOASTable, there would be no way to distinguish the "unchanged" and "changed to empty object" cases (at least without some metadata outside the values array, which the consumer must know to look for).

  2. if the consumer is just inserting values into another Postgres instance, it would need to serialise the nested JSON again (an unnecessary decode/encode roundtrip)

It is proposed to instead provide a filter class which can be optionally plugged in to parse nested JSON before it reaches the consumer class. The implementation would be trivial for json[b] fields (just call json.loads() on the relevant fields in values).

There is a problem with DOMAIN type aliases, however; as mentioned above JsonReceiver emits the alias and not the underlying type, so there is no way for the proposed filter to detect these fields as JSON-able automatically. The filter would need to be configured with a set of type aliases which it should also parse.

Finally, we could extend the filter to also convert other JSON-like Postgres types (hstore, text[] come to mind), perhaps by using the decoding layer from psycopg2?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions