Skip to content

Web console: allow encoding of ASCII control chars#10795

Merged
vogievetsky merged 10 commits intoapache:masterfrom
implydata:allow-encoding
Jun 27, 2021
Merged

Web console: allow encoding of ASCII control chars#10795
vogievetsky merged 10 commits intoapache:masterfrom
implydata:allow-encoding

Conversation

@vogievetsky
Copy link
Copy Markdown
Contributor

@vogievetsky vogievetsky commented Jan 25, 2021

This PR makes it possible to input ASCII control chars through the web console UI by writing them as \x?? notation just like you can do in many languages inside of string literals.

image

This allows the delimiter and listDelimiter inputs to actually be useful in the parse data screen.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 25, 2021

How did you pick this one? Neither JSON nor Java support \xNN. Both do support \uNNNN (Unicode style). Java additionally supports \NNN (octal style).

References:

IMO matching what JSON does would make the most sense to people, since the underlying API is a JSON one.

@vogievetsky
Copy link
Copy Markdown
Contributor Author

I am only escaping char codes 0 to 0x1f so \uNNNN seemed like overkill. I have not considered (or ever used) \NNN. I use \xNN all the time in JS string literals. But I am down to do \uNNNN

@vogievetsky
Copy link
Copy Markdown
Contributor Author

Are there chars outside of 0 to 0x1f that should be encoded if we do \uNNNN ?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 26, 2021

I am only escaping char codes 0 to 0x1f so \uNNNN seemed like overkill. I have not considered (or ever used) \NNN. I use \xNN all the time in JS string literals. But I am down to do \uNNNN

I don't think it's about overkill / underkill but more what people will expect. The underlying APIs are JSON so IMO it is best if the web console accepts JSON style escapes.

I have not considered (or ever used) \NNN.

It's a C thing. I think we can skip it. It's only supported by Java, anyway, not JSON, and I think it makes more sense to stick to JSON.

Are there chars outside of 0 to 0x1f that should be encoded if we do \uNNNN ?

I would accept exactly what JSON does, no more & no less:

  • \" -> "
  • \\ -> \
  • \/ -> /
  • \b -> ascii 8
  • \t -> ascii 9
  • \n -> ascii 10
  • \f -> ascii 12
  • \r -> ascii 13
  • \uABCD -> unicode code point ABCD for character in basic multilingual plane (test with non-latin languages)
  • \uABCD\uEFGH -> utf-16 surrogate pair ABCD EFGH (test with emojis like 🔥)

That way, people can copy stuff freely back and forth between JSON specs and the web console fields.

Note: the way unicode characters outside BMP are encoded here is not universal; for example Python would encode 🙂 as \U0001F642 but JSON would encode it as \uD83D\uDE42.

@vogievetsky
Copy link
Copy Markdown
Contributor Author

Ok that seems like a plan. I could actually use the JSON stringier / parser itself and it would be super robust and guaranteed to follow the JSON spec by definition. I would make only one exception to the "no more & no less" rule:

Since this input would always be parsed in a string context I would allow a single unescaped ".

Meaning " would parse as itself instead of throwing an error (like an actual JSON parse would). \" would also parse to ".

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 26, 2021

Since this input would always be parsed in a string context I would allow a single unescaped ".

I think that would be OK, it makes sense to me. It breaks the 'freely copy back and forth' principle but in a very reasonable way. It will probably decrease the overall level of surprise for users.

@vogievetsky
Copy link
Copy Markdown
Contributor Author

@gianm I updated the PR per your suggestions

@vogievetsky vogievetsky merged commit 561cc71 into apache:master Jun 27, 2021
@vogievetsky vogievetsky deleted the allow-encoding branch June 27, 2021 01:54
jihoonson pushed a commit to jihoonson/druid that referenced this pull request Jul 12, 2021
* allow encoding of ascii control chars

* change to JSON

* make json escpaes work

* update snapshot

* break out component

* fix test

* update test script

* update formatter to be more chill
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants