Skip to content

Remove obsolete workaround for slow encoding of unicode characters.#30

Merged
clokep merged 5 commits into
masterfrom
clokep/ascii-fixes
Aug 10, 2020
Merged

Remove obsolete workaround for slow encoding of unicode characters.#30
clokep merged 5 commits into
masterfrom
clokep/ascii-fixes

Conversation

@clokep
Copy link
Copy Markdown
Member

@clokep clokep commented Aug 7, 2020

  • Bump the version of simplejson to a version that properly handles ensure_ascii=False (>3.14.0)
  • Remove the manual unascii-ifying and use ensure_ascii=False, which is now at least as fast.

* Bump the version of simplejson to a version that properly handles
  ensure_ascii=False
* Remove the manual unascii-ifying and use ensure_ascii=False, which
  is now at least as fast.
@clokep
Copy link
Copy Markdown
Member Author

clokep commented Aug 7, 2020

(I feel like the title of this PR isn't really descriptive, by the way. Not sure of a better way to succinctly describe this change.)

@clokep clokep requested a review from richvdh August 7, 2020 17:02
@clokep
Copy link
Copy Markdown
Member Author

clokep commented Aug 7, 2020

Requesting review from @richvdh since this was broken off of #29, but feel free to redirect if you'd like!

@richvdh
Copy link
Copy Markdown
Member

richvdh commented Aug 7, 2020

(I feel like the title of this PR isn't really descriptive, by the way. Not sure of a better way to succinctly describe this change.)

yeah. _unascii didn't really "decode ASCII" - it decoded the `\uXXXX escape sequences emitted by the json encoder, which were certainly not ascii.

Remove obsolete workaround for slow encoding of unicode characters.

maybe?

Copy link
Copy Markdown
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm.

Comment thread setup.py
Copy link
Copy Markdown
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, I'm not seeing any tests for encoding of control characters (\u0000 through \u001F). They should come out as "\x00" through "\x1F", iirc. could you add this?

@richvdh
Copy link
Copy Markdown
Member

richvdh commented Aug 7, 2020

hrm, according to https://matrix.org/docs/spec/appendices#grammar they should (with a few exceptions for "\n", "\r", etc), come out as "\u001F". Either way, can you add some tests to check behaviour is maintained?

@clokep
Copy link
Copy Markdown
Member Author

clokep commented Aug 7, 2020

hrm, according to https://matrix.org/docs/spec/appendices#grammar they should (with a few exceptions for "\n", "\r", etc), come out as "\u001F". Either way, can you add some tests to check behaviour is maintained?

I added a test that checks from 0x00 to 0x7E (the last printable ASCII character). I might have gone overboard, but I figured why not. 😄

I did check the behavior of these on master first -- I can move them to a separate PR if we want to CI pass on them first.

@clokep clokep changed the title Stop manually decoding ASCII Remove obsolete workaround for slow encoding of unicode characters. Aug 7, 2020
@clokep
Copy link
Copy Markdown
Member Author

clokep commented Aug 7, 2020

Arg, lint gets me every time on this repo. I think I might just black-ify it.

@clokep clokep merged commit 7e16350 into master Aug 10, 2020
@clokep clokep deleted the clokep/ascii-fixes branch August 10, 2020 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants