Port storage/ to Python 3#3725
Conversation
| try: | ||
| return json.loads(db_content) | ||
| except Exception: | ||
| # I don't know how, but somehow some binary objects get mangled upon |
There was a problem hiding this comment.
I'm assuming this is happening in the weird tables where we inexplicably store JSON as binary, like users_filter. Looking at this with psql, the rows look like:
matthew | 130 | \x7b22726f6f6d223a7b2274696d656c696e65223a7b226c696d6974223a32307d7d7d
which sure enough looks like \x7b ({) followed by a bunch of ASCII hex digits.
So, either it's being inserted in the first place in the mangled format, or the postgres client library (or server?) is incorrectly displaying it with only a single preceding \x rather than correctly putting a \x before each pair of hex digits.
The type of this column is:
filter_json | bytea
Looks like postgres is behaving as intended when displaying a bytea as \x.............; this is what you get if the bytea_output global is set to hex. If you set it to escape (the other valid value), you get this instead:
matrix=# set bytea_output='escape';
SET
matrix=# SHOW bytea_output;
bytea_output
--------------
escape
(1 row)
matrix=# select * from matrix.user_filters where user_id='matthew' limit 1;
user_id | filter_id | filter_json
---------+-----------+--------------------------------------
matthew | 0 | {"room": {"timeline": {"limit": 8}}}
(1 row)
So it looks like the data is okay in the DB, it's just that the python postgres library is interpreting \x1234 as a python-style hex escape sequence rather than this horrible mutant postgres hex sequence.
So, looking at psycopg's FAQ (http://initd.org/psycopg/docs/faq.html#faq-bytea-9-0), it turns out we're not the only people with these woes, and the solution is simply to do the same trick as above and explicitly set bytea_output='escape'; and then everything will be green & submarine.
tl;dr: let's kill this with fire and set bytea_output='escape'; instead :)
There was a problem hiding this comment.
actually, reading the FAQ some more, it sounds like this shouldn't even be a bug if we are running psycopg 2.4.1 or later with libpq from 9.0 or later. On matrix.org right now we look to be running psycopg 2.7.5 with libpq 5.10 (i.e. for postgres 10) - so i think this should be absolutely fine. Was this just that you were testing on a box using an ancient python postgres client?
I guess we should probably set bytea_output='escape' to be safe.
There was a problem hiding this comment.
(even nicer would be to kill these bytea tables with fire - i have no idea why they are byteas; it looks like they were originally created as LONGBLOBs in a fit of overenthusiasm/confusion, despite them just being utf8 JSON)
There was a problem hiding this comment.
@ara4n I noticed this coming out of your homeserver :)
There was a problem hiding this comment.
We should probably pin psycopg2 (and libpq, whose version psycopg2 helpfully exports). Debian jessie have suitably up to date packages, so it should be safe.
erikjohnston
left a comment
There was a problem hiding this comment.
Let's also remove the hack from db_to_json and instead pin our deps for psycopg2 and libpq
| max_depth = max(row[1] for row in rows) | ||
|
|
||
| if max_depth <= token.topological: | ||
| if max_depth < token.topological: |
There was a problem hiding this comment.
I don't think we want to remove the equality check here.
There was a problem hiding this comment.
it does, because otherwise the tests fail :(
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from __future__ import division |
There was a problem hiding this comment.
Is this necessary given we seem to only be doing // in here?
There was a problem hiding this comment.
maybe not, I don't think
| if not self.synchronous_commit: | ||
| cursor = db_conn.cursor() | ||
| cursor.execute("SET synchronous_commit TO OFF") | ||
| cursor.execute("SET bytea_output TO escape") |
There was a problem hiding this comment.
This does not look like the right place for it.
synchronous_commit defaults to true so this will in default not do that what you expect it to do.
There was a problem hiding this comment.
wait, shit. that's right. thanks for catching it, let me fix that up.
No description provided.