Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Port storage/ to Python 3#3725

Merged
hawkowl merged 16 commits into
developfrom
hawkowl/py3-storage
Aug 30, 2018
Merged

Port storage/ to Python 3#3725
hawkowl merged 16 commits into
developfrom
hawkowl/py3-storage

Conversation

@hawkowl
Copy link
Copy Markdown
Contributor

@hawkowl hawkowl commented Aug 20, 2018

No description provided.

@hawkowl hawkowl requested a review from a team August 20, 2018 20:28
Comment thread synapse/storage/_base.py Outdated
try:
return json.loads(db_content)
except Exception:
# I don't know how, but somehow some binary objects get mangled upon
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is happening in the weird tables where we inexplicably store JSON as binary, like users_filter. Looking at this with psql, the rows look like:

 matthew |       130 | \x7b22726f6f6d223a7b2274696d656c696e65223a7b226c696d6974223a32307d7d7d

which sure enough looks like \x7b ({) followed by a bunch of ASCII hex digits.

So, either it's being inserted in the first place in the mangled format, or the postgres client library (or server?) is incorrectly displaying it with only a single preceding \x rather than correctly putting a \x before each pair of hex digits.

The type of this column is:

filter_json | bytea

Looks like postgres is behaving as intended when displaying a bytea as \x.............; this is what you get if the bytea_output global is set to hex. If you set it to escape (the other valid value), you get this instead:

matrix=# set bytea_output='escape';
SET
matrix=# SHOW bytea_output;
 bytea_output 
--------------
 escape
(1 row)

matrix=# select * from matrix.user_filters where user_id='matthew' limit 1;
 user_id | filter_id |             filter_json              
---------+-----------+--------------------------------------
 matthew |         0 | {"room": {"timeline": {"limit": 8}}}
(1 row)

So it looks like the data is okay in the DB, it's just that the python postgres library is interpreting \x1234 as a python-style hex escape sequence rather than this horrible mutant postgres hex sequence.

So, looking at psycopg's FAQ (http://initd.org/psycopg/docs/faq.html#faq-bytea-9-0), it turns out we're not the only people with these woes, and the solution is simply to do the same trick as above and explicitly set bytea_output='escape'; and then everything will be green & submarine.

tl;dr: let's kill this with fire and set bytea_output='escape'; instead :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, reading the FAQ some more, it sounds like this shouldn't even be a bug if we are running psycopg 2.4.1 or later with libpq from 9.0 or later. On matrix.org right now we look to be running psycopg 2.7.5 with libpq 5.10 (i.e. for postgres 10) - so i think this should be absolutely fine. Was this just that you were testing on a box using an ancient python postgres client?

I guess we should probably set bytea_output='escape' to be safe.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(even nicer would be to kill these bytea tables with fire - i have no idea why they are byteas; it looks like they were originally created as LONGBLOBs in a fit of overenthusiasm/confusion, despite them just being utf8 JSON)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ara4n I noticed this coming out of your homeserver :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably pin psycopg2 (and libpq, whose version psycopg2 helpfully exports). Debian jessie have suitably up to date packages, so it should be safe.

Copy link
Copy Markdown
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also remove the hack from db_to_json and instead pin our deps for psycopg2 and libpq

Comment thread synapse/storage/events.py
max_depth = max(row[1] for row in rows)

if max_depth <= token.topological:
if max_depth < token.topological:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to remove the equality check here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, because otherwise the tests fail :(

Comment thread synapse/storage/events_worker.py Outdated
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import division
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary given we seem to only be doing // in here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not, I don't think

@hawkowl hawkowl dismissed erikjohnston’s stale review August 29, 2018 10:54

hopefully fixed

Comment thread synapse/storage/engines/postgres.py Outdated
if not self.synchronous_commit:
cursor = db_conn.cursor()
cursor.execute("SET synchronous_commit TO OFF")
cursor.execute("SET bytea_output TO escape")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like the right place for it.
synchronous_commit defaults to true so this will in default not do that what you expect it to do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, shit. that's right. thanks for catching it, let me fix that up.

@hawkowl hawkowl requested a review from erikjohnston August 29, 2018 13:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants