Skip to content

lds: parse wal2json string values according to their type#879

Open
ab-pm wants to merge 6 commits intographile:v4from
ab-pm:fix/logical-decoding-type-parser
Open

lds: parse wal2json string values according to their type#879
ab-pm wants to merge 6 commits intographile:v4from
ab-pm:fix/logical-decoding-type-parser

Conversation

@ab-pm
Copy link
Contributor

@ab-pm ab-pm commented Mar 10, 2026

Description

We're using @graphile/lds in a LiveSource (similar to subscriptions-lds) that is not only used for monitoring live queries, but also publishes the actual row data. We discovered a bug in our application where the announcement .data that is passed to the subscribeToLogicalDecoding callback did not match the expected value - we assumed it would basically look like the results from pg.query('SELECT * FROM schema.table'). However we found that some of the properties contained plain strings!

The culprit appears to be that wal2json outputs most record values as string, basically as value::text (I believe?) and not to_json(value) as one might expect. The changes emitted by PgLogicalDecoding are then simply transformed from key/value lists to a dictionary, but no type-specific parsing is performed (the columntypes are available but ignored).

Performance impact

unknown

Correctness impact

As far as I can tell, most live queries listen for changes to records by the record identifier (primary key), and those typically consist of integers and texts where this is not an issue. I fail to understand how the liveCollection predicates work and what they check for.

In our case, we were looking at the published data and tried to access some properties that should have been Dates, JSON values or Arrays (timestamptz, jsonb and int[] respectively). The issue surely occurs with other rarely used datatypes, like enum(array)s, ranges, custom composite types, etc.

Checklist

  • My code matches the project's code style and yarn lint:fix passes.
  • I've added tests for the new feature, and yarn test passes.
  • I have detailed the new feature in the relevant documentation.
  • I have not added this feature to RELEASE_NOTES.md as I did not find any
  • If this is a breaking change I've explained why.

Further steps

This PR does report the issue and serves to start the discussion. If you're interested in merging the proposed fix, I'd like to continue with some refactorings; in particular I think it should be possible to inject custom type parsers. For that I would suggest to make parse, changeToRecord and changeToPk methods of the PgLogicalDecoding class, and use this.types instead of pg.types where .types should be a pg.CustomTypesConfig-shaped object that can be configured via the constructor options and falls back to pg.types if not passed.

[wal2json serialises all (non-numeric, non-boolean, non-null) database values as strings](https://github.com/eulerto/wal2json/blob/f8bab055fafc196539c9652b6b5584228498c6eb/wal2json.c#L1337-L1384) into the JSON,
and so far we failed to take their types into account when converting them to records.
This now
* makes wal2json output strings as far as possible (in particular for numeric types whose values did not always have a JSON number representation)
* parses the values [just like `node-postgres` does it](https://github.com/brianc/node-postgres/blob/32ec730b51a1fd73bf97d65105c949729fa9ec80/packages/pg/lib/result.js#L68-L73)
* instructs wal2json to output the type oids instead of the type names for this
@benjie
Copy link
Member

benjie commented Mar 11, 2026

Let’s make this opt in so people who don’t need it don’t incur the additional cost; add something like types?: typeof pg.types to Options.

…ding` to `PgLogicalDecoding` and add doc comments to describe them
@ab-pm
Copy link
Contributor Author

ab-pm commented Mar 12, 2026

Done.

I notice the CI appears to be running some older postgres versions with some older wal2json only; apparently numeric-data-types-as-string was introduced only in release 2.6 2y ago. I guess this should also become configurable?

@ab-pm ab-pm force-pushed the fix/logical-decoding-type-parser branch from 2b9657c to ab8d299 Compare March 12, 2026 01:55
ab-pm added 3 commits March 12, 2026 04:59
* Refactor `changeToRecord` and `changeToPk` from exports into methods of `PgLogicalDecoding`
* add `LdsOptions.types` to choose parsers, use noop identity function by default
* call `changeToRecord` only once on updates
* add `LdsOptions.params`
* fix `numeric-data-types-as-string` to be set only if `types` are passed
* adjust the `parse` function accordingly, to handle `number` values without a type parser
* set `include-type-oids` only if `types` are passed
* adjust `changeToRecord`/`changeToPk` to call `parse` only if type oids are present
@ab-pm ab-pm force-pushed the fix/logical-decoding-type-parser branch 3 times, most recently from 2a3685d to b87a3c8 Compare March 12, 2026 04:50
… and try to ignore the `software-properties-common` package that is not available on Debian 13 and later? (I have no idea what I'm doing)
@ab-pm ab-pm force-pushed the fix/logical-decoding-type-parser branch from b87a3c8 to 49ec36f Compare March 12, 2026 04:53
...params,
})
.flatMap(entry => (typeof entry[1] == "string" ? entry : []))
.map(pg.Client.prototype.escapeLiteral)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pretty. Unfortunately the top-level export pg.escapeLiteral is only available since brianc/node-postgres#2954, and I didn't want to require pg@^8.11.0…

Comment on lines +16 to +23
apt-transport-https \
ca-certificates \
build-essential \
pkg-config \
git \
postgresql-server-dev-$PG_MAJOR
add-apt-repository "deb http://ftp.debian.org/debian testing main contrib"
apt-get update
# add-apt-repository "deb http://ftp.debian.org/debian testing main contrib"
# apt-get update
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to not merge this. As the commit message indicates, I have no idea what I was doing; but apparently installing ca-certificates solved the TLS problem with git-cloning the wal2json repository?

Not sure what the debian testing repository is good for. If it's still needed, would simply echoing this strings into some /etc/apt/sources.list.d/???.list file work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants